Risk Analytics

Master in Management, HEC Lausanne

Authors
Affiliations

Quentin Besson

quentin.besson@unil.ch

Marc Bourleau

marc.bourleau@unil.ch

Charlène Khairallah Milosevic

charlene.khairallah@unil.ch

Selena Milosevic

selena.milosevic@unil.ch

Published

December 18, 2023

1 Practical 1

1.1 Part 1: financial returns and normality

1.1.1 (a) Read in the data and assess the stationarity of the raw stock indices.

We are going to use two methods. The first one, is to visualize the stock indices over time to get a sense of whether they appear stationary or not; using plot().

Code
# Create interactive plots for each index
plot_sp500 <- plot_ly(x = sp500.df$DATE, y = sp500.df$SP500, type = "scatter", mode = "lines", name = "SP500") %>%
  layout(title = "SP500")

plot_cac40 <- plot_ly(x = cac40.df$DATE, y = cac40.df$CAC40, type = "scatter", mode = "lines", name = "CAC40") %>%
  layout(title = "CAC40")

plot_nasdaq <- plot_ly(x = nasdaq.df$DATE, y = nasdaq.df$NASDAQ, type = "scatter", mode = "lines", name = "Nasdaq") %>%
  layout(title = "Nasdaq")

plot_nikkei <- plot_ly(x = nikkei.df$DATE, y = nikkei.df$NIKKEI, type = "scatter", mode = "lines", name = "NIKKEI") %>%
  layout(title = "NIKKEI")

combined_plot <- subplot(plot_sp500, plot_cac40, plot_nasdaq, plot_nikkei, nrows = 2)

# Set a common title for the entire grid
combined_plot <- layout(combined_plot, title = "Comparison of Stock Indices")

# Show the combined plot
combined_plot

It appears that none of the stock indices are stationary based on the plots generated. These plots clearly show the presence of trends, which are a strong indicator of non-stationarity data. For instance, if we take the NASDAQ index as an example, it exhibits a noticeable upward trend between 1999 and the beginning of 2000.

Another common approach is to use the Augmented-Dickey-Fuller (ADF) test. If we can reject the null hypothesis (p-value < 0.05), then we can consider the time-series to be stationary. The null hypothesis of this test is that the time series is non-stationary. If the p-value is less than your significance level, you can reject the null hypothesis and consider the series stationary.

Code
# ADF - Test 
adf_test_sp500 <- adf.test(sp500$SP500) ; adf_test_sp500
## 
##  Augmented Dickey-Fuller Test
## 
## data:  sp500$SP500
## Dickey-Fuller = -1.212849, Lag order = 15, p-value = 0.904429
## alternative hypothesis: stationary
adf_test_nasdaq <- adf.test(nasdaq$NASDAQ) ; adf_test_nasdaq
## 
##  Augmented Dickey-Fuller Test
## 
## data:  nasdaq$NASDAQ
## Dickey-Fuller = -1.570559, Lag order = 13, p-value = 0.760138
## alternative hypothesis: stationary
adf_test_cac40 <- adf.test(cac40$CAC40) ; adf_test_cac40
## 
##  Augmented Dickey-Fuller Test
## 
## data:  cac40$CAC40
## Dickey-Fuller = -0.7332045, Lag order = 13, p-value = 0.96759
## alternative hypothesis: stationary
adf_test_nikkei <- adf.test(nikkei$NIKKEI) ; adf_test_nikkei
## 
##  Augmented Dickey-Fuller Test
## 
## data:  nikkei$NIKKEI
## Dickey-Fuller = -2.419329, Lag order = 13, p-value = 0.400792
## alternative hypothesis: stationary

We can’t reject any of the null hypothesis, implying that we do not have enough evidence to conclude that the data is stationary.

1.1.2 (b) Create a function to transform the daily stock indices into their daily negative log returns counterparts. Plot the latter series and assess their stationarity. To compare the series, also plot the negative log returns on a common scale to all indices.

Code
# Define a function to calculate negative log returns
calculate_log_returns <- function(stocks) {
  log_returns <- -diff(log(stocks))
  return(log_returns)
}
# Calculate negative log returns for each index
log_returns_sp500 <- calculate_log_returns(sp500) %>% na.omit()
log_returns_cac40 <- calculate_log_returns(cac40) %>% na.omit()
log_returns_nasdaq <- calculate_log_returns(nasdaq) %>% na.omit()
log_returns_nikkei <- calculate_log_returns(nikkei) %>% na.omit()
# We need to transform the index of rows as a separate variable to include the date into the plots: we create a function
log_returns_sp500$Dates <- rownames(log_returns_sp500)
log_returns_sp500.df <- data.frame(log_returns_sp500)
log_returns_cac40$Dates <- rownames(log_returns_cac40)
log_returns_cac40.df <- data.frame(log_returns_cac40)
log_returns_nasdaq$Dates <- rownames(log_returns_nasdaq)
log_returns_nasdaq.df <- data.frame(log_returns_nasdaq)
log_returns_nikkei$Dates <- rownames(log_returns_nikkei)
log_returns_nikkei.df <- data.frame(log_returns_nikkei)
# Plot for SP500
plot_sp5002 <- plot_ly(x = log_returns_sp500.df$Dates, y = log_returns_sp500, type = "scatter", mode = "lines", name = "SP500") %>% layout(title = "SP500 Negative Log Returns", xaxis = list(title = "Time"), yaxis = list(title = "Negative Log Returns"), shapes = list(list(type = "line", x0 = 1, x1 = length(log_returns_sp500), y0 = 0, y1 = 0, line = list(color = "red"))))

# Plot for CAC40
plot_cac402 <- plot_ly(x = log_returns_cac40.df$Dates, y = log_returns_cac40, type = "scatter", mode = "lines", name = "CAC40") %>% layout(title = "CAC40 Negative Log Returns", xaxis = list(title = "Time"), yaxis = list(title = "Negative Log Returns"), shapes = list(list(type = "line", x0 = 1, x1 = length(log_returns_cac40), y0 = 0, y1 = 0, line = list(color = "red"))))

# Plot for Nasdaq
plot_nasdaq2 <- plot_ly(x = log_returns_nasdaq.df$Dates , y = log_returns_nasdaq, type = "scatter", mode = "lines", name = "Nasdaq") %>% layout(title = "Nasdaq Negative Log Returns", xaxis = list(title = "Time"), yaxis = list(title = "Negative Log Returns"), shapes = list(list(type = "line", x0 = 1, x1 = length(log_returns_nasdaq), y0 = 0, y1 = 0, line = list(color = "red"))))

# Plot for NIKKEI
plot_nikkei2 <- plot_ly(x = log_returns_nikkei.df$Dates , y = log_returns_nikkei, type = "scatter", mode = "lines", name = "NIKKEI") %>% layout(title = "Negative Log Returns", xaxis = list(title = "Time"), yaxis = list(title = "Negative Log Returns"), shapes = list(list(type = "line", x0 = 1, x1 = length(log_returns_nikkei), y0 = 0, y1 = 0, line = list(color = "red"))))

# Create a subplot with the list of individual plots
combined_plot2 <- subplot(plot_sp5002, plot_cac402, plot_nasdaq2, plot_nikkei2, nrows = 2)
combined_plot2 <- layout(combined_plot2, title = "Comparison of negative log return of Stock Indices (same scale)")
# Show the combined plot
combined_plot2

Looking at the comparison of negative logarithm return of the stock indices, we discern the following:

  • mean is not constant: not totally centered around 0, implying that there is a long-term trend in the indexes.

  • variance is not constant: there are clear upward/downward trends. SP500 has very high spikes, such as the one in 1997-10-27.

Therefore, graphically, the indexes seem to be non-stationary.

1.1.3 (c) Draw histograms of the negative log returns and compare them to the Normal distribution. What do you observe?

Code
# Set the number of histogram bins
num_bins <- 30
# Create a common scale for the histograms
par(mfrow = c(2, 2))  # Create a 2x2 grid of plots

# Plot histograms and overlay Normal distribution curves
hist(log_returns_sp500, breaks = num_bins, main = "Histogram - SP500 Log Returns", xlab = "Log Returns", probability = TRUE, xlim=c(0.1,-0.1))
curve(dnorm(x, mean = mean(log_returns_sp500), sd = sd(log_returns_sp500)), col = "red", lwd = 2, add = TRUE)

hist(log_returns_cac40, breaks = num_bins, main = "Histogram - CAC40 Log Returns", xlab = "Log Returns", probability = TRUE, xlim=c(0.1,-0.1))
curve(dnorm(x, mean = mean(log_returns_cac40), sd = sd(log_returns_cac40)), col = "red", lwd = 2, add = TRUE)

hist(log_returns_nasdaq, breaks = num_bins, main = "Histogram - Nasdaq Log Returns", xlab = "Log Returns", probability = TRUE, xlim=c(0.1,-0.1))
curve(dnorm(x, mean = mean(log_returns_nasdaq), sd = sd(log_returns_nasdaq)), col = "red", lwd = 2, add = TRUE)

hist(log_returns_nikkei, breaks = num_bins, main = "Histogram - NIKKEI Log Returns", xlab = "Log Returns", probability = TRUE, xlim=c(0.1,-0.1))
curve(dnorm(x, mean = mean(log_returns_nikkei), sd = sd(log_returns_nikkei)), col = "red", lwd = 2, add = TRUE)

Code

# Reset the plot layout
par(mfrow = c(1, 1))

In a perfectly normal distribution, the data would closely follow the red curve. The histograms of the stock indices’ negative log returns, when compared to the overlaid normal distribution, indicate deviations from normality, suggesting the presence of skewness. Such deviations hold significant implications for the Black-Scholes formula, which assumes normally distributed returns and constant volatility. If returns exhibit heavy tails, as suggested by the histograms, the Black-Scholes model may not accurately capture the risk of extreme price movements, thereby potentially mispricing options that rely on these assumptions.

To validate these observations, further analytical methods will be employed later in this document to rigorously assess the distributional characteristics of the stock indices’ returns.

1.1.4 (d) Check the normality assumption of the negative log returns using QQ-plots. What is your conclusion?

Code
# Create QQ-plots for log returns

par(mfrow = c(2, 2))  # Create a 2x2 grid of plots

qqnorm(log_returns_sp500, main = "QQ-Plot - SP500 Log Returns")
qqline(log_returns_sp500)
qqnorm(log_returns_cac40, main = "QQ-Plot - CAC40 Log Returns")
qqline(log_returns_cac40)
qqnorm(log_returns_nasdaq, main = "QQ-Plot - Nasdaq Log Returns")
qqline(log_returns_nasdaq)
qqnorm(log_returns_nikkei, main = "QQ-Plot - NIKKEI Log Returns")
qqline(log_returns_nikkei)

The QQ-plots for the negative log returns across various stock indices exhibit noticeable deviations from the expected diagonal line, particularly within the tails, which indicates a departure from normal distribution, with a tendency towards more pronounced extreme movements. The Nasdaq index, for instance, distinctly shows this trend with an increased frequency of significant negative returns. The differing scales of the plots also reveal a variable degree of volatility among the indices, with the Nasdaq displaying particularly stark fluctuations. This evidence challenges the assumption of normally distributed returns integral to the Black-Scholes model, suggesting potential inaccuracies in option pricing that relies on this assumption.

In conclusion, the observed anomalies in the QQ-plots signal that the normality assumption critical to the Black-Scholes framework is violated.

1.1.5 (e) Formally test the normality assumption of the negative log returns using an Anderson-Darling testing procedure. Do you reject the Normal hypothesis?

Code
# Perform Anderson-Darling test for normality
ad_test_sp500 <- ad.test(log_returns_sp500)
ad_test_cac40 <- ad.test(log_returns_cac40)
ad_test_nasdaq <- ad.test(log_returns_nasdaq)
ad_test_nikkei <- ad.test(log_returns_nikkei)

# Print the results
print(ad_test_sp500)
## 
##  Anderson-Darling normality test
## 
## data:  log_returns_sp500
## A = 29.23992, p-value < 2.22e-16
print(ad_test_cac40)
## 
##  Anderson-Darling normality test
## 
## data:  log_returns_cac40
## A = 10.3297, p-value < 2.22e-16
print(ad_test_nasdaq)
## 
##  Anderson-Darling normality test
## 
## data:  log_returns_nasdaq
## A = 29.42786, p-value < 2.22e-16
print(ad_test_nikkei)
## 
##  Anderson-Darling normality test
## 
## data:  log_returns_nikkei
## A = 8.189858, p-value < 2.22e-16

Upon conducting the Anderson-Darling test for normality on the negative log returns of the stock indices, we encounter p-values that effectively round down to zero ((p-value < 2.2e-16) , providing robust evidence to reject the null hypothesis of normality.

In summary, log returns of the indices are not normally distributed, violating one of the key assumptions behind the Black-Scholes formula.

1.1.6 (f) Use the fitdistr() function from the MASS package in order to obtain the (maximum-likelihood estimated) parameters of distributions you could imagine for the negative log returns. Try to fit at least two different distributions on the data and, using an information criteria (such as the AIC), decide which distributional framework fits best for each of the series.

Code
# Fit distributions : normal
fit_sp500 <- fitdistr(log_returns_sp500, "normal")
fit_cac40 <- fitdistr(log_returns_cac40, "normal")
fit_nasdaq <- fitdistr(log_returns_nasdaq, "normal")
fit_nikkei <- fitdistr(log_returns_nikkei, "normal")

# Fit a t-distribution to log returns
fit_sp500_t <- fitdistr(log_returns_sp500, "t")
fit_cac40_t <- fitdistr(log_returns_cac40, "t")
fit_nasdaq_t <- fitdistr(log_returns_nasdaq, "t")
fit_nikkei_t <- fitdistr(log_returns_nikkei, "t")

# Calculate AIC for each distribution
aic_sp500 <- AIC(fit_sp500)
aic_cac40 <- AIC(fit_cac40)
aic_nasdaq <- AIC(fit_nasdaq)
aic_nikkei <- AIC(fit_nikkei)

# Calculate AIC for each distribution
aic_sp500_t <- AIC(fit_sp500_t)
aic_cac40_t <- AIC(fit_cac40_t)
aic_nasdaq_t <- AIC(fit_nasdaq_t)
aic_nikkei_t <- AIC(fit_nikkei_t)

# Create a data frame to store AIC values
aic_data <- data.frame(
  Index = c("SP500", "CAC40", "NASDAQ", "NIKKEI"),
  Normal = c(aic_sp500, aic_cac40, aic_nasdaq, aic_nikkei),
  T_Distribution = c(aic_sp500_t, aic_cac40_t, aic_nasdaq_t, aic_nikkei_t)
)

# Print the AIC data frame
print(aic_data)
##    Index      Normal T_Distribution
## 1  SP500 -22510.4993    -22944.6649
## 2  CAC40 -14447.5481    -14626.0090
## 3 NASDAQ -13384.9374    -13766.5445
## 4 NIKKEI -14070.5771    -14216.9075

# Create a bar plot of AIC values with improved aesthetics
barplot(
  t(as.matrix(aic_data[, -1])),
  beside = TRUE,
  col = c("#1f77b4", "#2ca02c"), # Changed to more appealing colors
  names.arg = aic_data$Index,
  main = "AIC Comparison of Distributions",
  xlab = "Index",
  ylab = "AIC Value",
  cex.names = 0.8,    
  cex.axis = 0.8,    
  cex.main = 1,      
  cex.lab = 0.9,      
  las = 1,            
  border = NA,        
  legend.text = c("Normal", "T-Distribution"),
  args.legend = list(x = "bottomright", bty = "n", cex = 0.8)
)

# You can also add a grid for better readability
grid(nx = NA, ny = NULL, col = "gray", lty = "dotted", lwd = par("lwd"))

Upon fitting both normal and t-distributions to the data and evaluating them with the Akaike Information Criterion (AIC), it is important to note that both methods yield similar AIC values. However, the t-distribution consistently offers a better fit for all four indices, as indicated by its slightly lower AIC values. This suggests that while the normal distribution provides a close representation of the data, the student’s t-distribution is more likely to minimize information loss when representing the underlying structure of the data, capturing the tails of the distributions more effectively.

As a result, we will continue analyzing the indices with the Student distribution.

1.1.7 (g) If this has not been done in (f), fit a t-distribution to the negative log returns using fitdistr().Using a QQ-plot for each of the series, decide whether the fit is better than with a Normal distribution, based on your answer in (d).

Code

# Set up the plotting area for a 4x2 grid
par(mfrow = c(4, 2), mar = c(4, 4, 2, 1), oma = c(0, 0, 3, 0))

# Normal distribution QQ plots
qqnorm(log_returns_sp500, main = "Normal QQ-Plot - SP500")
qqline(log_returns_sp500)
qqnorm(log_returns_cac40, main = "Normal QQ-Plot - CAC40")
qqline(log_returns_cac40)
qqnorm(log_returns_nasdaq, main = "Normal QQ-Plot - NASDAQ")
qqline(log_returns_nasdaq)
qqnorm(log_returns_nikkei, main = "Normal QQ-Plot - NIKKEI")
qqline(log_returns_nikkei)

# t-distribution QQ plots
qqplot(qt(ppoints(log_returns_sp500), df=fit_sp500_t$estimate["df"]), log_returns_sp500, 
       main = "t-dist QQ-Plot - SP500", xlab = "Theoretical Quantiles", ylab = "Sample Quantiles")
abline(0, 1, col = "red")
qqplot(qt(ppoints(log_returns_cac40), df=fit_cac40_t$estimate["df"]), log_returns_cac40, 
       main = "t-dist QQ-Plot - CAC40", xlab = "Theoretical Quantiles", ylab = "Sample Quantiles")
abline(0, 1, col = "red")
qqplot(qt(ppoints(log_returns_nasdaq), df=fit_nasdaq_t$estimate["df"]), log_returns_nasdaq, 
       main = "t-dist QQ-Plot - NASDAQ", xlab = "Theoretical Quantiles", ylab = "Sample Quantiles")
abline(0, 1, col = "red")
qqplot(qt(ppoints(log_returns_nikkei), df=fit_nikkei_t$estimate["df"]), log_returns_nikkei, 
       main = "t-dist QQ-Plot - NIKKEI", xlab = "Theoretical Quantiles", ylab = "Sample Quantiles")
abline(0, 1, col = "red")

Code

# Reset the plotting area
par(mfrow = c(1, 1))

Upon examining the QQ plots, the data points in the t-distribution QQ plots adhere more closely to the reference line, particularly in the tails, suggesting that the t-distribution provides a superior fit for the negative log returns of the indices compared to the normal distribution. This improved alignment, especially in the tails, indicates that the t-distribution more accurately captures the behavior of extreme market movements.

1.2 Part 2: Financial time series, volatility and the random walk hypothesis

1.2.1 (a) Plot the ACF of all the series in Part 1 (i.e. the raw series as well as the negative log returns). What do you observe?

Code
##############################################################  RAW SERIES ############################################################### 
par(mfrow = c(2, 2))  # Create a 2x2 grid of plots
par(oma = c(0, 0, 3, 0)) # Adjust the outer margins to make space for the title

# Plot ACF for the S&P 500 index with adjusted margins
par(mar = c(5, 4, 3, 2))
acf(sp500, main = "ACF of S&P 500 Index")

# Plot ACF for the CAC 40 index with adjusted margins
par(mar = c(5, 4, 3, 2))
acf(cac40, main = "ACF of CAC 40 Index")

# Plot ACF for the NASDAQ index with adjusted margins
par(mar = c(5, 4, 3, 2))
acf(nasdaq, main = "ACF of NASDAQ Index")

# Plot ACF for the Nikkei index with adjusted margins
par(mar = c(5, 4, 3, 2))
acf(nikkei, main = "ACF of Nikkei Index")

# Add the main title
mtext("ACF of raw series", outer = TRUE, cex = 1.5, line = 1)

Code

##############################################################  NEGATIVE LOG RETURNS ################################################################ 

# Setting up the 2x2 plot area with adjusted outer margins for the title
par(mfrow = c(2, 2), oma = c(0, 0, 2, 0))

# Plot ACF for the S&P 500 index with adjusted margins
par(mar = c(5, 4, 3, 2))
acf(log_returns_sp500, main = "ACF of S&P 500 Index")

# Plot ACF for the CAC 40 index with adjusted margins
par(mar = c(5, 4, 3, 2))
acf(log_returns_cac40, main = "ACF of CAC 40 Index")

# Plot ACF for the NASDAQ index with adjusted margins
par(mar = c(5, 4, 3, 2))
acf(log_returns_nasdaq, main = "ACF of NASDAQ Index")

# Plot ACF for the Nikkei index with adjusted margins
par(mar = c(5, 4, 3, 2))
acf(log_returns_nikkei, main = "ACF of Nikkei Index")

# Add the central title
mtext("ACF of Negative Log Returns", outer = TRUE, cex = 1.5, line = 0)

The Autocorrelation Function (ACF) plots for both the raw series and the negative log returns reveal distinct patterns. For the raw series, the ACF shows a strong, consistent autocorrelation at all lags, indicating a non-stationary process where past values have a persistent influence on future values.

On the other hand, the ACF plots for the negative log returns show most autocorrelations within the confidence bands, implying no significant autocorrelation at any lag. This suggests that the log returns are closer to a stationary process, despite some lags being outside of the confidence bands, indicating that there may be occasional and minor deviations from strict stationarity.

1.2.2 (b) Use a Ljung-Box procedure to formally test for (temporal) serial dependence in the series. What is your conclusion?

Code
# For Raw Data
box_test_sp500 <- Box.test(sp500, type = "Ljung-Box")
box_test_cac40 <- Box.test(cac40, type = "Ljung-Box")
box_test_nasdaq <- Box.test(nasdaq, type = "Ljung-Box")
box_test_nikkei <- Box.test(nikkei, type = "Ljung-Box")

# For Negative Log Returns
box_test_log_sp500 <- Box.test(log_returns_sp500, type = "Ljung-Box")
box_test_log_cac40 <- Box.test(log_returns_cac40, type = "Ljung-Box")
box_test_log_nasdaq <- Box.test(log_returns_nasdaq, type = "Ljung-Box")
box_test_log_nikkei <- Box.test(log_returns_nikkei, type = "Ljung-Box")

# Print p-values
cat("Raw SP500 Ljung-Box Test - p-value:", box_test_sp500$p.value, "\n")
## Raw SP500 Ljung-Box Test - p-value: 0
cat("Raw CAC40 Ljung-Box Test - p-value:", box_test_cac40$p.value, "\n")
## Raw CAC40 Ljung-Box Test - p-value: 0
cat("Raw NASDAQ Ljung-Box Test - p-value:", box_test_nasdaq$p.value, "\n")
## Raw NASDAQ Ljung-Box Test - p-value: 0
cat("Raw NIKKEI Ljung-Box Test - p-value:", box_test_nikkei$p.value, "\n")
## Raw NIKKEI Ljung-Box Test - p-value: 0

cat("Log Returns SP500 Ljung-Box Test - p-value:", box_test_log_sp500$p.value, "\n")
## Log Returns SP500 Ljung-Box Test - p-value: 0.932638483
cat("Log Returns CAC40 Ljung-Box Test - p-value:", box_test_log_cac40$p.value, "\n")
## Log Returns CAC40 Ljung-Box Test - p-value: 0.495130124
cat("Log Returns NASDAQ Ljung-Box Test - p-value:", box_test_log_nasdaq$p.value, "\n")
## Log Returns NASDAQ Ljung-Box Test - p-value: 0.409219414
cat("Log Returns NIKKEI Ljung-Box Test - p-value:", box_test_log_nikkei$p.value, "\n")
## Log Returns NIKKEI Ljung-Box Test - p-value: 0.0513979989

For the raw stock market indices, the results indicate strong evidence of serial dependence, with p-values significantly below 0.05 (<2.2e-16). This suggests that there is a temporal correlation in the raw stock market data, indicating that past observations influence future values.

However, when applying the Ljung-Box test to the negative log returns of these indices, the results show no significant serial dependence. Specifically, for log returns, the p-values are notably higher, with values ranging from 0.0514 to 0.933. This implies that after taking the logarithm of returns, the temporal correlation is greatly reduced, indicating a closer approximation to a stationary process. It’s worth noting that the Nikkei index stands out with a p-value of 0.05, signifying a higher degree of autocorrelation in its log returns compared to the other indices.

1.2.3 (c) Propose ARIMA models for each of the negative log returns series, based on visualisation tools (e.g. ACF and PACF). Select an ARIMA model using auto.arima() to each of the negative log returns series. Comment on the difference. Assess the residuals of the resulting models.

Code
########## PACF gives AR --> p is the order of AR component #################### 
par(mfrow = c(2, 2))  # Create a 2x2 grid of plots
par(oma = c(0, 0, 1, 0))


par(mar = c(5, 4, 3, 2))
pacf(log_returns_sp500)

par(mar = c(5, 4, 3, 2))
pacf(log_returns_cac40)

par(mar = c(5, 4, 3, 2))
pacf(log_returns_nasdaq)

par(mar = c(5, 4, 3, 2))
pacf(log_returns_nikkei)
# Add the central title
mtext("PACF of Negative Log Returns", outer = TRUE, cex = 1.5, line = -0.3)

Code
########## ACF gives MA --> q is the order of MAcomponent #################### 
# Setting up the 2x2 plot area with adjusted outer margins for the title
par(mfrow = c(2, 2), oma = c(0, 0, 2, 0))

# Plot ACF for the S&P 500 index with adjusted margins
par(mar = c(5, 4, 3, 2))
acf(log_returns_sp500, main = "ACF of S&P 500 Index")

# Plot ACF for the CAC 40 index with adjusted margins
par(mar = c(5, 4, 3, 2))
acf(log_returns_cac40, main = "ACF of CAC 40 Index")

# Plot ACF for the NASDAQ index with adjusted margins
par(mar = c(5, 4, 3, 2))
acf(log_returns_nasdaq, main = "ACF of NASDAQ Index")

# Plot ACF for the Nikkei index with adjusted margins
par(mar = c(5, 4, 3, 2))
acf(log_returns_nikkei, main = "ACF of Nikkei Index")

# Add the central title
mtext("ACF of Negative Log Returns", outer = TRUE, cex = 1.5, line = 0)

Code
################## ADF TEST to see if need to differentiate or not ################## 
# ADF test for SP500 log returns
adf_sp500 <- adf.test(log_returns_sp500)
print(paste("P-value for SP500:", adf_sp500$p.value))
## [1] "P-value for SP500: 0.01"

# ADF test for CAC40 log returns
adf_cac40 <- adf.test(log_returns_cac40)
print(paste("P-value for CAC40:", adf_cac40$p.value))
## [1] "P-value for CAC40: 0.01"

# ADF test for NASDAQ log returns
adf_nasdaq <- adf.test(log_returns_nasdaq)
print(paste("P-value for NASDAQ:", adf_nasdaq$p.value))
## [1] "P-value for NASDAQ: 0.01"

# ADF test for NIKKEI log returns
adf_nikkei <- adf.test(log_returns_nikkei)
print(paste("P-value for NIKKEI:", adf_nikkei$p.value))
## [1] "P-value for NIKKEI: 0.01"

As the p-value is smaller than 0 in the ADF test, we don’t need to differentiate because the null hypothesis of the time series is rejected, suggesting that the time series is already stationary.

Code
############################## CHOOSING OUR OWN ORDER ############################## 
# STEP 1: own arima
##ARIMA SP500: order is p, d and q  
arima_sp500 <- Arima(log_returns_sp500, order = c(8,0,7)) # AIC of -22527
arima_sp500
## Series: log_returns_sp500 
## ARIMA(8,0,7) with non-zero mean 
## 
## Coefficients:
##            ar1        ar2      ar3        ar4       ar5      ar6
##       0.475007  -0.230146  0.35635  -0.066815  0.110447  0.19875
## s.e.  0.101402   0.102107  0.11911   0.113078  0.111588  0.11132
##             ar7       ar8        ma1       ma2        ma3       ma4
##       -0.791668  0.045867  -0.479227  0.215558  -0.377897  0.084015
## s.e.   0.088481  0.020247   0.100359  0.108722   0.121459  0.116628
##             ma5        ma6       ma7       mean
##       -0.154838  -0.175357  0.771575  -0.000318
## s.e.   0.115527   0.112882  0.096961   0.000171
## 
## sigma^2 = 0.000108777:  log likelihood = 11285.55
## AIC=-22537.1   AICc=-22536.93   BIC=-22431.96
## ARIMA CAC40
arima_cac40 <- Arima(log_returns_cac40, order = c(5,0,5)) # AIC of -14458
arima_cac40
## Series: log_returns_cac40 
## ARIMA(5,0,5) with non-zero mean 
## 
## Coefficients:
##            ar1       ar2        ar3       ar4        ar5        ma1
##       0.962970  0.287472  -0.347933  0.056554  -0.244987  -0.953959
## s.e.  0.119875  0.178070   0.166782  0.165495        NaN   0.120519
##             ma2       ma3       ma4       ma5       mean
##       -0.323555  0.314450  0.028540  0.212973  -0.000159
## s.e.   0.176153  0.169551  0.166043       NaN   0.000280
## 
## sigma^2 = 0.000212653:  log likelihood = 7241.06
## AIC=-14458.13   AICc=-14458.01   BIC=-14387.88
## ARIMA NASDAQ
arima_nasdaq <- Arima(log_returns_nasdaq, order = c(7,0,7))# AIC of -13412
arima_nasdaq
## Series: log_returns_nasdaq 
## ARIMA(7,0,7) with non-zero mean 
## 
## Coefficients:
##            ar1        ar2       ar3       ar4        ar5       ar6
##       0.244385  -0.247321  0.141274  0.125001  -0.144992  0.031856
## s.e.  0.150974   0.187830  0.218332  0.220995   0.187704  0.156771
##             ar7        ma1       ma2        ma3        ma4       ma5
##       -0.814958  -0.223783  0.206986  -0.124798  -0.131296  0.097960
## s.e.   0.136203   0.147045  0.181585   0.203839   0.202995  0.171941
##             ma6       ma7       mean
##       -0.031123  0.835889  -0.000365
## s.e.   0.149967  0.130248   0.000344
## 
## sigma^2 = 0.00031862:  log likelihood = 6721.84
## AIC=-13411.69   AICc=-13411.48   BIC=-13318.02
## ARIMA NIKKEI
arima_nikkei <- Arima(log_returns_nikkei, order = c(1,0,1)) # AIC of -14075
arima_nikkei
## Series: log_returns_nikkei 
## ARIMA(1,0,1) with non-zero mean 
## 
## Coefficients:
##            ar1        ma1      mean
##       0.665909  -0.707699  0.000182
## s.e.  0.189777   0.178365  0.000259
## 
## sigma^2 = 0.000218814:  log likelihood = 7041.36
## AIC=-14074.73   AICc=-14074.71   BIC=-14051.4
# STEP 2: residuals 
residuals_sp500 <- residuals(arima_sp500)
residuals_cac40 <- residuals(arima_cac40)
residuals_nasdaq <- residuals(arima_nasdaq)
residuals_nikkei <- residuals(arima_nikkei)
##############################  AUTO ARIMA ############################## 
# STEP 1: auto arima
plot_sp500_auto<- auto.arima(log_returns_sp500) # AIC of -22513
plot_cac40_auto <- auto.arima(log_returns_cac40) # AIC of -14449 
plot_nasdaq_auto <- auto.arima(log_returns_nasdaq)# AIC of -13395 
plot_nikkei_auto <- auto.arima(log_returns_nikkei)# AIC of -14080
# Create comparison plot
par(mfrow = c(2, 2))
plot(residuals_sp500, main = "Residuals - SP500 OWN ARIMA Model")
plot(plot_sp500_auto$residuals, main = "Residuals - SP500 Auto ARIMA Model")
plot(residuals_cac40, main = "Residuals - CAC40 OWN ARIMA Model")
plot(plot_cac40_auto$residuals, main = "Residuals - CAC40 Auto ARIMA Model")

Code
plot(residuals_nasdaq, main = "Residuals - NASDAQ OWN ARIMA Model")
plot(plot_nasdaq_auto$residuals, main = "Residuals - NASDAQ Auto ARIMA Model")
plot(residuals_nikkei, main = "Residuals - NIKKEI OWN ARIMA Model")
plot(plot_nikkei_auto$residuals, main = "Residuals - NIKKEI Auto ARIMA Model")

Code
par(mfrow = c(1, 1))

Using auto.arima yields better results in terms of AIC. By assessing the residuals, we observe that they might be independent, as their mean is close to 0. When it comes to examining the residuals, there doesn’t appear to be a significant visual distinction between the two ARIMA models. To make a thorough assessment, we will analyze their residuals using both the Autocorrelation Function (ACF) and the Ljung-Box test.

1.2.4 (d) Assess the residuals of the resulting models from (c), both their raw values and their absolute values, through visual tools (such as the ACF) and formal tests (e.g. Ljung-Box). What do you conclude about the independence assumption?

Code
# Ljung-Box test for autocorrelation in residuals
# Obtain raw residuals of OWN ARIMA 
raw_residuals_sp500_own<- residuals(arima_sp500)
raw_residuals_cac40_own<- residuals(arima_cac40)
raw_residuals_nasdaq_own<- residuals(arima_nasdaq)
raw_residuals_nikkei_own<- residuals(arima_nikkei)
# Obtain raw residuals of AUTO ARIMA 
raw_residuals_sp500_auto <- residuals(plot_sp500_auto)
raw_residuals_cac40_auto <- residuals(plot_cac40_auto)
raw_residuals_nasdaq_auto <- residuals(plot_nasdaq_auto)
raw_residuals_nikkei_auto <- residuals(plot_nikkei_auto)

# Obtain absolute residuals of OWN ARIMA 
absolute_residuals_sp500_own <- abs(raw_residuals_sp500_own)
absolute_residuals_cac40_own <- abs(raw_residuals_cac40_own)
absolute_residuals_nasdaq_own <- abs(raw_residuals_nasdaq_own)
absolute_residuals_nikkei_own <- abs(raw_residuals_nikkei_own)
# Obtain absolute residuals of AUTO ARIMA 
absolute_residuals_sp500_auto <- abs(raw_residuals_sp500_auto)
absolute_residuals_cac40_auto <- abs(raw_residuals_cac40_auto)
absolute_residuals_nasdaq_auto <- abs(raw_residuals_nasdaq_auto)
absolute_residuals_nikkei_auto <- abs(raw_residuals_nikkei_auto)
# Plot ACF of raw residuals of OWN ARIMA 
par(mfrow = c(2, 2))
Acf(raw_residuals_sp500_own, main = "SP 500")
Acf(raw_residuals_cac40_own, main = "CAC 40")
Acf(raw_residuals_nasdaq_own, main = "NASDAQ")
Acf(raw_residuals_nikkei_own, main = "NIKKEI")
mtext("ACF of Raw Residuals from OWN ARIMA", outer = TRUE, cex = 1, line = -1)

Code
# Plot ACF of raw residuals of AUTO ARIMA 
par(mfrow = c(2, 2))
Acf(raw_residuals_sp500_auto, main = "SP 500")
Acf(raw_residuals_cac40_auto, main = "CAC 40")
Acf(raw_residuals_nasdaq_auto, main = "NASDAQ")
Acf(raw_residuals_nikkei_auto, main = "NIKKEI")
mtext("ACF of Raw Residuals from AUTO ARIMA", outer = TRUE, cex = 1, line = -1)

Code
# Plot ACF of absolute residuals of OWN ARIMA
par(mfrow = c(2, 2))
Acf(absolute_residuals_sp500_own, main = "SP 500")
Acf(absolute_residuals_cac40_own, main = "CAC40")
Acf(absolute_residuals_nasdaq_own, main = "NASDAQ")
Acf(absolute_residuals_nikkei_own, main = "NIKKEI")
mtext("ACF of Abs Residuals Log Returns from OWN ARIMA", outer = TRUE, cex = 1, line = -1)

Code
# Plot ACF of absolute residuals of AUTO ARIMA 
par(mfrow = c(2, 2))
Acf(absolute_residuals_sp500_auto, main = "SP 500")
Acf(absolute_residuals_cac40_auto, main = "CAC40")
Acf(absolute_residuals_nasdaq_auto, main = "NASDAQ")
Acf(absolute_residuals_nikkei_auto, main = "NIKKEI")
mtext("ACF of Abs Residuals Log Returns from AUTO ARIMA", outer = TRUE, cex = 1, line = -1)

Code
par(mfrow = c(1, 1))
# Ljung-Box test on raw residuals of OWN ARIMA
p_values_raw_own <- sapply(list(
  Box.test(raw_residuals_sp500_own, type = "Ljung-Box"),
  Box.test(raw_residuals_cac40_own, type = "Ljung-Box"),
  Box.test(raw_residuals_nasdaq_own, type = "Ljung-Box"),
  Box.test(raw_residuals_nikkei_own, type = "Ljung-Box")
), function(x) x$p.value)

# Ljung-Box test on raw residuals of AUTO ARIMA
p_values_raw_auto <- sapply(list(
  Box.test(raw_residuals_sp500_auto, type = "Ljung-Box"),
  Box.test(raw_residuals_cac40_auto, type = "Ljung-Box"),
  Box.test(raw_residuals_nasdaq_auto, type = "Ljung-Box"),
  Box.test(raw_residuals_nikkei_auto, type = "Ljung-Box")
), function(x) x$p.value)

# Ljung-Box test on absolute residuals of OWN ARIMA
p_values_abs_own <- sapply(list(
  Box.test(absolute_residuals_sp500_own, type = "Ljung-Box"),
  Box.test(absolute_residuals_cac40_own, type = "Ljung-Box"),
  Box.test(absolute_residuals_nasdaq_own, type = "Ljung-Box"),
  Box.test(absolute_residuals_nikkei_own, type = "Ljung-Box")
), function(x) x$p.value)

# Ljung-Box test on absolute residuals of AUTO ARIMA
p_values_abs_auto <- sapply(list(
  Box.test(absolute_residuals_sp500_auto, type = "Ljung-Box"),
  Box.test(absolute_residuals_cac40_auto, type = "Ljung-Box"),
  Box.test(absolute_residuals_nasdaq_auto, type = "Ljung-Box"),
  Box.test(absolute_residuals_nikkei_auto, type = "Ljung-Box")
), function(x) x$p.value)

# Set scipen option to display p-values in full numeric format
options(scipen = 999)

# Create a data frame with p-values and index names
p_values_data <- data.frame(
  Index = rep(c("SP500", "CAC40", "NASDAQ", "NIKKEI"), each = 4),
  Model = rep(c("Raw (OWN ARIMA)", "Raw (AUTO ARIMA)", "Absolute (OWN ARIMA)", "Absolute (AUTO ARIMA)"), times = 4),
  P_Value = c(p_values_raw_own, p_values_raw_auto, p_values_abs_own, p_values_abs_auto)
)
# Create a kable with the desired styling
kable_output <- kable(p_values_data, format = "html", escape = FALSE, caption = "P-Values from Ljung-Box Test") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)

# Print the kable to the R console (note: this will not render as HTML in the console)
kable_output
P-Values from Ljung-Box Test
Index Model P_Value
SP500 Raw (OWN ARIMA) 0.981090694
SP500 Raw (AUTO ARIMA) 0.929812388
SP500 Absolute (OWN ARIMA) 0.661245435
SP500 Absolute (AUTO ARIMA) 0.975504647
CAC40 Raw (OWN ARIMA) 0.983060220
CAC40 Raw (AUTO ARIMA) 0.495130124
CAC40 Absolute (OWN ARIMA) 0.493721726
CAC40 Absolute (AUTO ARIMA) 0.992305513
NASDAQ Raw (OWN ARIMA) 0.000000000
NASDAQ Raw (AUTO ARIMA) 0.000000000
NASDAQ Absolute (OWN ARIMA) 0.000000000
NASDAQ Absolute (AUTO ARIMA) 0.000373729
NIKKEI Raw (OWN ARIMA) 0.000000000
NIKKEI Raw (AUTO ARIMA) 0.000000000
NIKKEI Absolute (OWN ARIMA) 0.000000000
NIKKEI Absolute (AUTO ARIMA) 0.000760171

The ACF plots and Ljung-Box test results for the ARIMA model residuals—both raw and absolute—highlight concerns regarding the independence of these residuals across the S&P 500, CAC 40, NASDAQ, and Nikkei indices. Notably, the ACF plots for raw residuals indicate significant autocorrelation at various lags for certain indices, suggesting potential residual dependencies. The plots for absolute residuals further underscore this, showing pronounced autocorrelation, especially for the NASDAQ index, indicating the presence of volatility clustering.

Comparative analysis between the AUTO ARIMA and OWN ARIMA models reveals more pronounced autocorrelation in the AUTO ARIMA residuals, suggesting that the OWN ARIMA may better capture the series dynamics to some extent. However, near-zero p-values from the Ljung-Box tests for absolute residuals across all indices decisively reject the null hypothesis of independence, confirming that significant autocorrelation persists.

These findings suggest that while the raw residuals might not always display autocorrelation, their absolute values do, indicating non-constant variance and casting doubt on the adequacy of the models. The implication is clear: a model that can handle conditional variance, such as GARCH, is warranted for a more accurate representation of the financial time series, given the heteroscedastic nature revealed by these tests.

1.2.5 (e) Plot the volatility of the raw series of indices. What is your conclusion on the homoscedasticity assumption?

Code
# SP500
volatility_sp500 <- volatility(as.vector(sp500.df$SP500))
volatility_cac40 <- volatility(as.vector(cac40.df$CAC40))
volatility_nasdaq <- volatility(as.vector(nasdaq.df$NASDAQ))
volatility_nikkei <- volatility(as.vector(nikkei.df$NIKKEI))
# Create new data frames for each index that combine dates and volatility
sp500_data <- data.frame(Date = sp500.df$DATE, Volatility = volatility_sp500)
cac40_data <- data.frame(Date = cac40.df$DATE, Volatility = volatility_cac40)
nasdaq_data <- data.frame(Date = nasdaq.df$DATE, Volatility = volatility_nasdaq)
nikkei_data <- data.frame(Date = nikkei.df$DATE, Volatility = volatility_nikkei)

# Now plot using the new data frames for the x-axis and y-axis
p <- plot_ly() %>%
    add_trace(data = sp500_data, x = ~Date, y = ~Volatility, name = 'S&P 500', type = 'scatter', mode = 'lines', line = list(color = '#1f77b4')) %>%
    add_trace(data = cac40_data, x = ~Date, y = ~Volatility, name = 'CAC 40', type = 'scatter', mode = 'lines', line = list(color = '#ff7f0e'), visible = F) %>%
    add_trace(data = nasdaq_data, x = ~Date, y = ~Volatility, name = 'NASDAQ', type = 'scatter', mode = 'lines', line = list(color = '#2ca02c'), visible = F) %>%
    add_trace(data = nikkei_data, x = ~Date, y = ~Volatility, name = 'Nikkei', type = 'scatter', mode = 'lines', line = list(color = '#d62728'), visible = F) %>%
    layout(
        title = "Volatility of Major Stock Indices",
        xaxis = list(title = "Date", type = 'date'), # Ensure that the x-axis is treated as a date
        yaxis = list(title = "Volatility"),
        updatemenus = list(
            list(
                type = "buttons",
                active = -1,
                buttons = list(
                    list(label = "S&P 500",
                         method = "update",
                         args = list(list(visible = c(TRUE, FALSE, FALSE, FALSE)),
                                     list(title = "Volatility of the Raw S&P 500"))),
                    list(label = "CAC 40",
                         method = "update",
                         args = list(list(visible = c(FALSE, TRUE, FALSE, FALSE)),
                                     list(title = "Volatility of the Raw CAC 40"))),
                    list(label = "NASDAQ",
                         method = "update",
                         args = list(list(visible = c(FALSE, FALSE, TRUE, FALSE)),
                                     list(title = "Volatility of the Raw NASDAQ"))),
                    list(label = "Nikkei",
                         method = "update",
                         args = list(list(visible = c(FALSE, FALSE, FALSE, TRUE)),
                                     list(title = "Volatility of the Raw NIKKEI")))
                )
            )
        )
    )

# Display the plot
p

Volatility graphs indicate periods with significant spikes in volatility (ex: Nasdaq between 1999-2000), which challenges the assumption of homoscedasticity. This assumption—that the variance of the error term remains constant throughout the time series—is a foundational element of the Black-Scholes model. The presence of heteroscedasticity, where variance fluctuates over time, undermines the model’s premises, since Black-Scholes requires stable volatility to accurately price options

Therefore, we will fit a GARCH model that addresses the issue of heteroscedasticity of the residuals.

1.2.6 (f) Fit GARCH models to the negative log returns of each series with both standardised and skewed t-distributions, with order (1; 1), using the garchFit() function from the fGarch library. Assess the quality of the fit by evaluating the residuals.

Code
# Fit a GARCH(1,1) model with standard t-distribution to the log returns
garch_sp500_std <- garchFit(~garch(1, 1), data = log_returns_sp500$SP500, cond.dist = "std")
## 
## Series Initialization:
##  ARMA Model:                arma
##  Formula Mean:              ~ arma(0, 0)
##  GARCH Model:               garch
##  Formula Variance:          ~ garch(1, 1)
##  ARMA Order:                0 0
##  Max ARMA Order:            0
##  GARCH Order:               1 1
##  Max GARCH Order:           1
##  Maximum Order:             1
##  Conditional Dist:          std
##  h.start:                   2
##  llh.start:                 1
##  Length of Series:          3587
##  Recursion Init:            mci
##  Series Scale:              0.010491962
## 
## Parameter Initialization:
##  Initial Parameters:          $params
##  Limits of Transformations:   $U, $V
##  Which Parameters are Fixed?  $includes
##  Parameter Matrix:
##                       U             V        params includes
##     mu     -0.299230155   0.299230155 -0.0299230155     TRUE
##     omega   0.000001000 100.000000000  0.1000000000     TRUE
##     alpha1  0.000000010   0.999999990  0.1000000000     TRUE
##     gamma1 -0.999999990   0.999999990  0.1000000000    FALSE
##     beta1   0.000000010   0.999999990  0.8000000000     TRUE
##     delta   0.000000000   2.000000000  2.0000000000    FALSE
##     skew    0.100000000  10.000000000  1.0000000000    FALSE
##     shape   1.000000000  10.000000000  4.0000000000     TRUE
##  Index List of Parameters to be Optimized:
##     mu  omega alpha1  beta1  shape 
##      1      2      3      5      8 
##  Persistence:                  0.9 
## 
## 
## --- START OF TRACE ---
## Selected Algorithm: nlminb 
## 
## R coded nlminb Solver: 
## 
##   0:     4669.6617: -0.0299230 0.100000 0.100000 0.800000  4.00000
##   1:     4653.2560: -0.0299267 0.0964369 0.126279 0.816252  4.00076
##   2:     4640.2382: -0.0299305 0.0668959 0.129129 0.806915  4.00081
##   3:     4604.7527: -0.0299439 0.0371413 0.173453 0.838828  4.00266
##   4:     4602.2404: -0.0299456 0.0308549 0.171815 0.838379  4.00276
##   5:     4599.6062: -0.0299517 0.0275837 0.173680 0.851521  4.00342
##   6:     4595.1884: -0.0299675 0.00882774 0.162614 0.868053  4.00473
##   7:     4587.5570: -0.0300027 0.0119282 0.144891 0.888522  4.00708
##   8:     4576.2085: -0.0300906 0.00933635 0.0933110 0.914911  4.01406
##   9:     4574.9847: -0.0301028 0.00915149 0.0901542 0.924502  4.01515
##  10:     4571.5557: -0.0301158 0.00412105 0.0835584 0.930216  4.01644
##  11:     4570.1677: -0.0301341 0.00313278 0.0777042 0.938195  4.01846
##  12:     4569.5030: -0.0301699 0.00172102 0.0705882 0.943177  4.02340
##  13:     4568.4016: -0.0302319 0.00378595 0.0660818 0.944417  4.03194
##  14:     4568.0201: -0.0303042 0.00322821 0.0636956 0.945992  4.04137
##  15:     4567.6471: -0.0303797 0.00277105 0.0621747 0.948402  4.05079
##  16:     4563.2196: -0.0328033 0.00144146 0.0515558 0.954218  4.35843
##  17:     4559.6930: -0.0353225 0.00494597 0.0516320 0.950096  4.66540
##  18:     4556.6060: -0.0380417 0.00419623 0.0685033 0.935620  4.96969
##  19:     4556.2616: -0.0419178 0.00615743 0.0614150 0.934897  5.26041
##  20:     4555.4270: -0.0419179 0.00688338 0.0624557 0.936186  5.26042
##  21:     4554.7931: -0.0419584 0.00570379 0.0624387 0.936167  5.26022
##  22:     4554.3947: -0.0420587 0.00574421 0.0631798 0.937200  5.26070
##  23:     4554.0745: -0.0422506 0.00494385 0.0631355 0.937142  5.26395
##  24:     4553.8539: -0.0425971 0.00506576 0.0635679 0.937613  5.27260
##  25:     4553.6212: -0.0432515 0.00462928 0.0636814 0.937364  5.29153
##  26:     4552.7864: -0.0496541 0.00558909 0.0654126 0.934195  5.50182
##  27:     4551.5552: -0.0525042 0.00416739 0.0607391 0.940266  5.50973
##  28:     4551.1219: -0.0512140 0.00328580 0.0599334 0.941347  5.59537
##  29:     4550.5698: -0.0500074 0.00352845 0.0590256 0.942319  5.68236
##  30:     4548.9038: -0.0464653 0.00294004 0.0476854 0.950047  6.44531
##  31:     4547.9624: -0.0545621 0.00255975 0.0533355 0.946780  6.85848
##  32:     4547.8367: -0.0555259 0.00338058 0.0526418 0.945650  7.04649
##  33:     4547.7629: -0.0530941 0.00294319 0.0507143 0.947759  7.13663
##  34:     4547.7587: -0.0536343 0.00292153 0.0508253 0.947798  7.14424
##  35:     4547.7582: -0.0534913 0.00294414 0.0510763 0.947525  7.16608
##  36:     4547.7581: -0.0535813 0.00293908 0.0509344 0.947647  7.17003
##  37:     4547.7581: -0.0535589 0.00293823 0.0509352 0.947650  7.16913
##  38:     4547.7581: -0.0535619 0.00293827 0.0509366 0.947649  7.16924
## 
## Final Estimate of the Negative LLH:
##  LLH:  -11798.7241    norm LLH:  -3.28930138 
##                 mu              omega             alpha1 
## -0.000561968905040  0.000000323448739  0.050936566979092 
##              beta1              shape 
##  0.947648620437824  7.169242601046535 
## 
## R-optimhess Difference Approximated Hessian Matrix:
##                       mu               omega             alpha1
## mu     -64267166.9718240        852832546.32        8783.389315
## omega  852832546.3208052 -167700644922046.72 -6186789986.584208
## alpha1      8783.3893154      -6186789986.58     -398930.863849
## beta1      80091.7614312      -7762837563.67     -450701.989033
## shape        -41.4983706         -6584621.54        -442.039936
##                     beta1             shape
## mu           80091.761431      -41.49837063
## omega  -7762837563.671912 -6584621.54338337
## alpha1     -450701.989033     -442.03993639
## beta1      -534839.536953     -514.38423463
## shape         -514.384235       -2.00067257
## attr(,"time")
## Time difference of 0.0314137936 secs
## 
## --- END OF TRACE ---
## 
## 
## Time to Estimate Parameters:
##  Time difference of 0.133339882 secs
garch_cac40_std <- garchFit(~garch(1, 1), data = log_returns_cac40$CAC40, cond.dist = "std")
## 
## Series Initialization:
##  ARMA Model:                arma
##  Formula Mean:              ~ arma(0, 0)
##  GARCH Model:               garch
##  Formula Variance:          ~ garch(1, 1)
##  ARMA Order:                0 0
##  Max ARMA Order:            0
##  GARCH Order:               1 1
##  Max GARCH Order:           1
##  Maximum Order:             1
##  Conditional Dist:          std
##  h.start:                   2
##  llh.start:                 1
##  Length of Series:          2576
##  Recursion Init:            mci
##  Series Scale:              0.0146431753
## 
## Parameter Initialization:
##  Initial Parameters:          $params
##  Limits of Transformations:   $U, $V
##  Which Parameters are Fixed?  $includes
##  Parameter Matrix:
##                       U             V        params includes
##     mu     -0.117670784   0.117670784 -0.0117670784     TRUE
##     omega   0.000001000 100.000000000  0.1000000000     TRUE
##     alpha1  0.000000010   0.999999990  0.1000000000     TRUE
##     gamma1 -0.999999990   0.999999990  0.1000000000    FALSE
##     beta1   0.000000010   0.999999990  0.8000000000     TRUE
##     delta   0.000000000   2.000000000  2.0000000000    FALSE
##     skew    0.100000000  10.000000000  1.0000000000    FALSE
##     shape   1.000000000  10.000000000  4.0000000000     TRUE
##  Index List of Parameters to be Optimized:
##     mu  omega alpha1  beta1  shape 
##      1      2      3      5      8 
##  Persistence:                  0.9 
## 
## 
## --- START OF TRACE ---
## Selected Algorithm: nlminb 
## 
## R coded nlminb Solver: 
## 
##   0:     3473.1134: -0.0117671 0.100000 0.100000 0.800000  4.00000
##   1:     3456.7666: -0.0117674 0.112453 0.116246 0.817069  4.00086
##   2:     3451.4493: -0.0117685 0.0878698 0.126467 0.817068  4.00237
##   3:     3441.8585: -0.0117716 0.0571051 0.147142 0.855099  4.00726
##   4:     3437.5349: -0.0117766 0.0162732 0.130676 0.884361  4.01429
##   5:     3435.4808: -0.0117814 0.0193791 0.124148 0.906139  4.01993
##   6:     3429.0853: -0.0117939 0.0203766 0.107791 0.903403  4.03670
##   7:     3427.4583: -0.0118006 0.0132761 0.0940258 0.918867  4.04560
##   8:     3427.2636: -0.0118007 0.0140768 0.0942408 0.919550  4.04567
##   9:     3427.1700: -0.0118012 0.0143856 0.0918127 0.919680  4.04644
##  10:     3426.9224: -0.0118044 0.0150911 0.0900837 0.921493  4.05086
##  11:     3425.6804: -0.0118348 0.0142996 0.0783797 0.927959  4.09386
##  12:     3424.7120: -0.0118670 0.0137842 0.0795825 0.929260  4.13886
##  13:     3420.9242: -0.0120300 0.00795408 0.0829204 0.928359  4.36641
##  14:     3416.9718: -0.0122332 0.0149677 0.0917629 0.915763  4.59321
##  15:     3413.6532: -0.0126697 0.0154080 0.0858640 0.913685  4.81816
##  16:     3409.4590: -0.0135515 0.0113872 0.0692442 0.931006  5.03219
##  17:     3405.0575: -0.0159261 0.00772165 0.0808620 0.925919  5.48763
##  18:     3394.5979: -0.0257942 0.0128761 0.0723392 0.918469  6.80208
##  19:     3393.7386: -0.0384438 0.0239538 0.0832708 0.901997  7.93129
##  20:     3386.5788: -0.0485677 0.0153625 0.0680843 0.916751  9.23148
##  21:     3385.2923: -0.0503223 0.0134750 0.0700083 0.918288  9.63869
##  22:     3384.6830: -0.0495229 0.00977438 0.0693476 0.923064  9.80355
##  23:     3383.9296: -0.0421525 0.00957911 0.0649925 0.927826  10.0000
##  24:     3383.8519: -0.0363361 0.00966461 0.0645899 0.927963  10.0000
##  25:     3383.8501: -0.0359327 0.00972077 0.0649489 0.927534  10.0000
##  26:     3383.8500: -0.0359438 0.00972675 0.0649086 0.927525  10.0000
##  27:     3383.8500: -0.0359730 0.00973794 0.0649280 0.927496  10.0000
##  28:     3383.8500: -0.0359738 0.00973574 0.0649255 0.927501  10.0000
## 
## Final Estimate of the Negative LLH:
##  LLH:  -7496.6096    norm LLH:  -2.91017453 
##                mu             omega            alpha1 
## -0.00052677030655  0.00000208756159  0.06492546081799 
##             beta1             shape 
##  0.92750085138243 10.00000000000000 
## 
## R-optimhess Difference Approximated Hessian Matrix:
##                      mu              omega             alpha1
## mu     -18260599.174983      303929886.979       16825.252897
## omega  303929886.979339 -9976307385770.393 -1056039821.980164
## alpha1     16825.252897    -1056039821.980     -158949.129072
## beta1      39412.875452    -1335523507.201     -179444.533131
## shape        133.818147        -797872.525        -108.933698
##                     beta1             shape
## mu           39412.875452     133.818147445
## omega  -1335523507.201395 -797872.524684289
## alpha1     -179444.533131    -108.933697788
## beta1      -218738.302210    -129.427424497
## shape         -129.427424      -0.618274044
## attr(,"time")
## Time difference of 0.0251500607 secs
## 
## --- END OF TRACE ---
## 
## 
## Time to Estimate Parameters:
##  Time difference of 0.0791110992 secs
garch_NASDAQ_std <- garchFit(~garch(1, 1), data = log_returns_nasdaq$NASDAQ, cond.dist = "std")
## 
## Series Initialization:
##  ARMA Model:                arma
##  Formula Mean:              ~ arma(0, 0)
##  GARCH Model:               garch
##  Formula Variance:          ~ garch(1, 1)
##  ARMA Order:                0 0
##  Max ARMA Order:            0
##  GARCH Order:               1 1
##  Max GARCH Order:           1
##  Maximum Order:             1
##  Conditional Dist:          std
##  h.start:                   2
##  llh.start:                 1
##  Length of Series:          2576
##  Recursion Init:            mci
##  Series Scale:              0.0179973845
## 
## Parameter Initialization:
##  Initial Parameters:          $params
##  Limits of Transformations:   $U, $V
##  Which Parameters are Fixed?  $includes
##  Parameter Matrix:
##                       U             V        params includes
##     mu     -0.202102822   0.202102822 -0.0202102822     TRUE
##     omega   0.000001000 100.000000000  0.1000000000     TRUE
##     alpha1  0.000000010   0.999999990  0.1000000000     TRUE
##     gamma1 -0.999999990   0.999999990  0.1000000000    FALSE
##     beta1   0.000000010   0.999999990  0.8000000000     TRUE
##     delta   0.000000000   2.000000000  2.0000000000    FALSE
##     skew    0.100000000  10.000000000  1.0000000000    FALSE
##     shape   1.000000000  10.000000000  4.0000000000     TRUE
##  Index List of Parameters to be Optimized:
##     mu  omega alpha1  beta1  shape 
##      1      2      3      5      8 
##  Persistence:                  0.9 
## 
## 
## --- START OF TRACE ---
## Selected Algorithm: nlminb 
## 
## R coded nlminb Solver: 
## 
##   0:     3223.6588: -0.0202103 0.100000 0.100000 0.800000  4.00000
##   1:     3131.5227: -0.0202180 0.0400874 0.170548 0.810996  4.00114
##   2:     3110.5734: -0.0202349 1.00000e-06 0.230022 0.865208  4.00442
##   3:     3102.9993: -0.0202387 0.00708637 0.220011 0.852118  4.00451
##   4:     3102.6214: -0.0202396 0.00565892 0.218684 0.850963  4.00458
##   5:     3102.3564: -0.0202418 0.00747689 0.217353 0.851048  4.00478
##   6:     3101.7567: -0.0202526 0.00647527 0.213110 0.851464  4.00582
##   7:     3097.6173: -0.0204158 0.00912334 0.171277 0.869777  4.02221
##   8:     3097.4106: -0.0204164 0.00788913 0.171124 0.869987  4.02229
##   9:     3097.2676: -0.0204176 0.00800695 0.171269 0.871226  4.02244
##  10:     3097.1626: -0.0204288 0.00642301 0.170512 0.872326  4.02378
##  11:     3096.7786: -0.0204631 0.00719489 0.169164 0.874033  4.02795
##  12:     3091.2038: -0.0215984 0.00254877 0.121519 0.907423  4.16799
##  13:     3088.6513: -0.0228404 0.00570226 0.124631 0.902765  4.31750
##  14:     3085.6202: -0.0240298 0.00485129 0.127721 0.896588  4.46804
##  15:     3078.8479: -0.0314571 0.00761568 0.143214 0.877962  5.00012
##  16:     3069.4444: -0.0743230 0.00104038 0.116467 0.903717  5.44092
##  17:     3067.2488: -0.0775755 0.00435326 0.0964100 0.911122  5.58265
##  18:     3065.1304: -0.0662837 0.00500337 0.113264 0.900896  6.01129
##  19:     3061.9150: -0.0569483 0.00353748 0.108236 0.900109  6.54317
##  20:     3059.5230: -0.0577754 0.00397484 0.105199 0.899508  7.24645
##  21:     3056.9259: -0.0696233 0.00393208 0.106571 0.896818  8.48570
##  22:     3055.7533: -0.0600441 0.00382736 0.0906336 0.908315  9.79732
##  23:     3055.6870: -0.0620434 0.00402585 0.105238 0.900065  10.0000
##  24:     3055.2515: -0.0639054 0.00327936 0.0991498 0.904274  10.0000
##  25:     3055.2238: -0.0628781 0.00345196 0.0958460 0.906353  10.0000
##  26:     3055.2203: -0.0630994 0.00339651 0.0964367 0.906133  10.0000
##  27:     3055.2203: -0.0630904 0.00338677 0.0963305 0.906237  10.0000
##  28:     3055.2203: -0.0630836 0.00338782 0.0963285 0.906240  10.0000
##  29:     3055.2203: -0.0630837 0.00338775 0.0963291 0.906239  10.0000
## 
## Final Estimate of the Negative LLH:
##  LLH:  -7293.93397    norm LLH:  -2.83149611 
##                mu             omega            alpha1 
## -0.00113534244917  0.00000109731129  0.09632908877410 
##             beta1             shape 
##  0.90623905076739 10.00000000000000 
## 
## R-optimhess Difference Approximated Hessian Matrix:
##                      mu               omega             alpha1
## mu     -21187157.760189       330562032.823      28536.8112245
## omega  330562032.822902 -11764926675222.760 -681805762.9991245
## alpha1     28536.811225      -681805762.999     -89872.8359738
## beta1      69087.752629      -947520668.370    -107253.4068604
## shape        303.226017         -544343.479        -73.8445904
##                     beta1             shape
## mu          69087.7526293     303.226017101
## omega  -947520668.3699408 -544343.479182607
## alpha1    -107253.4068604     -73.844590418
## beta1     -135975.9922360     -90.122555548
## shape         -90.1225555      -0.450936227
## attr(,"time")
## Time difference of 0.0231540203 secs
## 
## --- END OF TRACE ---
## 
## 
## Time to Estimate Parameters:
##  Time difference of 0.0815041065 secs
garch_nikkei_std <- garchFit(~garch(1, 1), data = log_returns_nikkei$NIKKEI, cond.dist = "std")
## 
## Series Initialization:
##  ARMA Model:                arma
##  Formula Mean:              ~ arma(0, 0)
##  GARCH Model:               garch
##  Formula Variance:          ~ garch(1, 1)
##  ARMA Order:                0 0
##  Max ARMA Order:            0
##  GARCH Order:               1 1
##  Max GARCH Order:           1
##  Maximum Order:             1
##  Conditional Dist:          std
##  h.start:                   2
##  llh.start:                 1
##  Length of Series:          2519
##  Recursion Init:            mci
##  Series Scale:              0.0148104327
## 
## Parameter Initialization:
##  Initial Parameters:          $params
##  Limits of Transformations:   $U, $V
##  Which Parameters are Fixed?  $includes
##  Parameter Matrix:
##                       U             V       params includes
##     mu     -0.109820691   0.109820691 0.0109820691     TRUE
##     omega   0.000001000 100.000000000 0.1000000000     TRUE
##     alpha1  0.000000010   0.999999990 0.1000000000     TRUE
##     gamma1 -0.999999990   0.999999990 0.1000000000    FALSE
##     beta1   0.000000010   0.999999990 0.8000000000     TRUE
##     delta   0.000000000   2.000000000 2.0000000000    FALSE
##     skew    0.100000000  10.000000000 1.0000000000    FALSE
##     shape   1.000000000  10.000000000 4.0000000000     TRUE
##  Index List of Parameters to be Optimized:
##     mu  omega alpha1  beta1  shape 
##      1      2      3      5      8 
##  Persistence:                  0.9 
## 
## 
## --- START OF TRACE ---
## Selected Algorithm: nlminb 
## 
## R coded nlminb Solver: 
## 
##   0:     3458.8282: 0.0109821 0.100000 0.100000 0.800000  4.00000
##   1:     3443.1561: 0.0109820 0.113956 0.114449 0.816950  4.00073
##   2:     3434.9056: 0.0109806 0.0586182 0.146881 0.838434  4.00646
##   3:     3434.4857: 0.0109805 0.0594233 0.147061 0.842364  4.00680
##   4:     3434.0921: 0.0109804 0.0563401 0.144628 0.843200  4.00714
##   5:     3433.3092: 0.0109801 0.0540900 0.143397 0.850781  4.00810
##   6:     3431.5424: 0.0109795 0.0420489 0.138332 0.860006  4.01013
##   7:     3429.7174: 0.0109776 0.0248878 0.132970 0.886057  4.01625
##   8:     3429.0936: 0.0109776 0.0234662 0.130924 0.884430  4.01632
##   9:     3428.5363: 0.0109773 0.0272564 0.125684 0.887388  4.01698
##  10:     3427.5169: 0.0109737 0.0246842 0.120231 0.888425  4.02988
##  11:     3424.5031: 0.0109427 0.00637950 0.0915985 0.924437  4.13926
##  12:     3424.0299: 0.0109427 0.00896924 0.0912942 0.924944  4.13933
##  13:     3423.5129: 0.0109422 0.00857960 0.0896479 0.923636  4.14091
##  14:     3423.0706: 0.0109409 0.0105049 0.0884720 0.924465  4.14565
##  15:     3422.5296: 0.0109379 0.0102494 0.0855989 0.924334  4.15587
##  16:     3421.8275: 0.0109318 0.0115349 0.0822376 0.926710  4.17668
##  17:     3420.8980: 0.0109192 0.0110961 0.0756780 0.929267  4.21860
##  18:     3420.1543: 0.0109045 0.0120205 0.0735119 0.931243  4.26098
##  19:     3414.0959: 0.0106772 0.0124309 0.0653366 0.929530  4.86853
##  20:     3412.8337: 0.0104994 0.0162940 0.0878563 0.910092  5.13905
##  21:     3409.1418: 0.00561329 0.0152034 0.0595589 0.932391  5.73830
##  22:     3409.0130: 0.00376826 0.0127230 0.0562952 0.930371  6.07384
##  23:     3406.7333: 0.00278255 0.0146098 0.0588270 0.929497  6.23859
##  24:     3405.1927: -0.00101610 0.0134555 0.0600664 0.928478  6.90470
##  25:     3404.4648: -0.00291171 0.0135327 0.0643786 0.923703  7.57438
##  26:     3404.2709: -0.00313974 0.0139819 0.0630376 0.924039  8.06989
##  27:     3404.2354: -0.00184748 0.0134895 0.0647705 0.923307  8.26206
##  28:     3404.2275: -0.00131751 0.0140709 0.0647763 0.922459  8.31345
##  29:     3404.2263: -0.00101883 0.0139327 0.0643698 0.922974  8.30909
##  30:     3404.2261: -0.000777547 0.0139039 0.0643657 0.923014  8.30625
##  31:     3404.2261: -0.000687303 0.0139086 0.0643907 0.922987  8.30370
##  32:     3404.2261: -0.000685952 0.0139104 0.0643940 0.922983  8.30325
## 
## Final Estimate of the Negative LLH:
##  LLH:  -7206.86852    norm LLH:  -2.86100378 
##                mu             omega            alpha1 
## -0.00001015923940  0.00000305123284  0.06439398289864 
##             beta1             shape 
##  0.92298268497996  8.30324845145830 
## 
## R-optimhess Difference Approximated Hessian Matrix:
##                       mu              omega            alpha1
## mu     -16198048.3458083       54110087.892      -4595.902325
## omega   54110087.8920143 -6103029435004.846 -772628342.594360
## alpha1     -4595.9023252     -772628342.594    -135181.587420
## beta1       3704.4473660     -973572473.486    -150806.014836
## shape        -37.4703747        -845192.656       -140.036975
##                   beta1             shape
## mu           3704.44737     -37.470374714
## omega  -973572473.48646 -845192.655659355
## alpha1    -150806.01484    -140.036974926
## beta1     -181312.42305    -168.381699950
## shape        -168.38170      -0.819521802
## attr(,"time")
## Time difference of 0.0241098404 secs
## 
## --- END OF TRACE ---
## 
## 
## Time to Estimate Parameters:
##  Time difference of 0.0856471062 secs

# Fit a GARCH(1,1) model with skewed t-distribution to the log returns
garch_sp500_sstd <- garchFit(~garch(1, 1), data = log_returns_sp500$SP500, cond.dist = "sstd")
## 
## Series Initialization:
##  ARMA Model:                arma
##  Formula Mean:              ~ arma(0, 0)
##  GARCH Model:               garch
##  Formula Variance:          ~ garch(1, 1)
##  ARMA Order:                0 0
##  Max ARMA Order:            0
##  GARCH Order:               1 1
##  Max GARCH Order:           1
##  Maximum Order:             1
##  Conditional Dist:          sstd
##  h.start:                   2
##  llh.start:                 1
##  Length of Series:          3587
##  Recursion Init:            mci
##  Series Scale:              0.010491962
## 
## Parameter Initialization:
##  Initial Parameters:          $params
##  Limits of Transformations:   $U, $V
##  Which Parameters are Fixed?  $includes
##  Parameter Matrix:
##                       U             V        params includes
##     mu     -0.299230155   0.299230155 -0.0299230155     TRUE
##     omega   0.000001000 100.000000000  0.1000000000     TRUE
##     alpha1  0.000000010   0.999999990  0.1000000000     TRUE
##     gamma1 -0.999999990   0.999999990  0.1000000000    FALSE
##     beta1   0.000000010   0.999999990  0.8000000000     TRUE
##     delta   0.000000000   2.000000000  2.0000000000    FALSE
##     skew    0.100000000  10.000000000  1.0000000000     TRUE
##     shape   1.000000000  10.000000000  4.0000000000     TRUE
##  Index List of Parameters to be Optimized:
##     mu  omega alpha1  beta1   skew  shape 
##      1      2      3      5      7      8 
##  Persistence:                  0.9 
## 
## 
## --- START OF TRACE ---
## Selected Algorithm: nlminb 
## 
## R coded nlminb Solver: 
## 
##   0:     4669.6617: -0.0299230 0.100000 0.100000 0.800000  1.00000  4.00000
##   1:     4653.0122: -0.0299267 0.0964117 0.126464 0.816367  1.00271  4.00077
##   2:     4639.6539: -0.0299304 0.0666668 0.129323 0.806918  1.00532  4.00081
##   3:     4603.6053: -0.0299433 0.0369250 0.173799 0.838961  1.01327  4.00268
##   4:     4601.0799: -0.0299447 0.0307286 0.172145 0.838478  1.01387  4.00277
##   5:     4598.3055: -0.0299504 0.0271574 0.173946 0.852297  1.01612  4.00348
##   6:     4594.1355: -0.0299647 0.00781362 0.161708 0.869597  1.02109  4.00489
##   7:     4585.7217: -0.0299949 0.0127017 0.143325 0.888652  1.03197  4.00727
##   8:     4578.4991: -0.0300266 0.0103938 0.117858 0.899401  1.04023  4.01066
##   9:     4569.2398: -0.0301119 1.00000e-06 0.0473501 0.961918  1.02480  4.02599
##  10:     4568.4647: -0.0301120 0.00125620 0.0475983 0.962202  1.02481  4.02600
##  11:     4567.8643: -0.0301096 0.000914651 0.0473125 0.961752  1.02581  4.02655
##  12:     4567.6179: -0.0301098 0.00182122 0.0470372 0.961127  1.02582  4.02659
##  13:     4567.5231: -0.0301099 0.00144990 0.0468501 0.960776  1.02583  4.02660
##  14:     4567.4316: -0.0301089 0.00168384 0.0469252 0.960833  1.02625  4.02683
##  15:     4567.4001: -0.0301070 0.00173174 0.0468776 0.960446  1.02713  4.02734
##  16:     4567.3519: -0.0301025 0.00191958 0.0469521 0.960510  1.02902  4.02837
##  17:     4567.3263: -0.0300979 0.00184647 0.0469177 0.960436  1.03093  4.02941
##  18:     4567.3185: -0.0300934 0.00195228 0.0469701 0.960447  1.03282  4.03046
##  19:     4567.3110: -0.0300901 0.00184852 0.0469098 0.960395  1.03434  4.03144
##  20:     4567.2933: -0.0300945 0.00192935 0.0470242 0.960321  1.03270  4.03070
##  21:     4567.2916: -0.0300979 0.00185197 0.0471991 0.960005  1.03153  4.03029
##  22:     4567.2567: -0.0300968 0.00197759 0.0472778 0.960055  1.03205  4.03063
##  23:     4567.2408: -0.0300946 0.00191262 0.0474027 0.959889  1.03312  4.03133
##  24:     4567.2255: -0.0300908 0.00200912 0.0474347 0.959942  1.03519  4.03288
##  25:     4567.1979: -0.0300990 0.00198194 0.0475271 0.959820  1.03268  4.03232
##  26:     4567.1902: -0.0301000 0.00221171 0.0483934 0.959334  1.03261  4.03254
##  27:     4567.0945: -0.0300987 0.00201984 0.0484211 0.959124  1.03338  4.03317
##  28:     4567.0512: -0.0300969 0.00218102 0.0489653 0.958734  1.03481  4.03453
##  29:     4566.9959: -0.0300937 0.00204705 0.0491664 0.958451  1.03775  4.03748
##  30:     4565.0621: -0.0308562 0.00271512 0.0456516 0.957570  1.00333  4.25389
##  31:     4561.5588: -0.0314400 0.00427691 0.0467255 0.955203  1.01828  4.47314
##  32:     4560.1802: -0.0321088 0.00381466 0.0468968 0.952024  1.03207  4.69219
##  33:     4556.8352: -0.0336757 0.00477597 0.0472940 0.952257  1.03577  4.90649
##  34:     4556.7565: -0.0378182 1.00000e-06 0.0461646 0.959085  1.00620  5.05009
##  35:     4555.5444: -0.0378182 0.000948092 0.0462985 0.959231  1.00621  5.05009
##  36:     4555.2007: -0.0378236 0.000819874 0.0459122 0.958688  1.00626  5.05076
##  37:     4554.9440: -0.0378380 0.00135827 0.0459048 0.958636  1.00632  5.05255
##  38:     4554.7421: -0.0378678 0.00122750 0.0456732 0.958279  1.00644  5.05626
##  39:     4554.5604: -0.0379276 0.00160330 0.0456562 0.958174  1.00665  5.06373
##  40:     4554.3372: -0.0380472 0.00152654 0.0455037 0.957858  1.00705  5.07867
##  41:     4549.2315: -0.0448412 0.00311834 0.0454229 0.953629  1.02821  5.92660
##  42:     4547.9312: -0.0529603 0.00333035 0.0609406 0.939427  1.02202  6.60500
##  43:     4547.4877: -0.0436517 0.00318385 0.0583392 0.941104  1.03282  6.85027
##  44:     4547.3603: -0.0514537 0.00271613 0.0478869 0.949503  1.03091  7.14841
##  45:     4546.9367: -0.0491274 0.00321748 0.0525694 0.945823  1.03191  7.04196
##  46:     4546.8913: -0.0485234 0.00314809 0.0520351 0.946279  1.03174  7.14970
##  47:     4546.8661: -0.0486926 0.00306594 0.0513661 0.946909  1.03147  7.29362
##  48:     4546.8654: -0.0486880 0.00306872 0.0514547 0.946835  1.03160  7.31370
##  49:     4546.8654: -0.0487045 0.00307465 0.0514905 0.946798  1.03164  7.31729
##  50:     4546.8654: -0.0487063 0.00307550 0.0514963 0.946792  1.03164  7.31733
## 
## Final Estimate of the Negative LLH:
##  LLH:  -11799.6167    norm LLH:  -3.28955025 
##                 mu              omega             alpha1 
## -0.000511025067784  0.000000338554507  0.051496286897391 
##              beta1               skew              shape 
##  0.946791537800483  1.031641378453867  7.317331600454360 
## 
## R-optimhess Difference Approximated Hessian Matrix:
##                      mu               omega             alpha1
## mu      -64012835.88032       3626005545.62      136173.298861
## omega  3626005545.62040 -163719171702614.31 -6043293213.656237
## alpha1     136173.29886      -6043293213.66     -390598.569134
## beta1      234197.47840      -7581221122.21     -440038.171538
## skew       101771.56382         13719948.47         277.298961
## shape        -188.58995         -6097657.91        -409.334669
##                       beta1              skew             shape
## mu          234197.47839738   101771.56382388     -188.58995018
## omega  -7581221122.21457481 13719948.46797637 -6097657.90734438
## alpha1     -440038.17153839      277.29896090     -409.33466857
## beta1      -521170.68172879       -8.94391098     -475.62179748
## skew            -8.94391098    -1925.19629718        6.57630289
## shape         -475.62179748        6.57630289       -1.82402596
## attr(,"time")
## Time difference of 0.0947899818 secs
## 
## --- END OF TRACE ---
## 
## 
## Time to Estimate Parameters:
##  Time difference of 0.375400066 secs
garch_cac40_sstd <- garchFit(~garch(1, 1), data = log_returns_cac40$CAC40, cond.dist = "sstd")
## 
## Series Initialization:
##  ARMA Model:                arma
##  Formula Mean:              ~ arma(0, 0)
##  GARCH Model:               garch
##  Formula Variance:          ~ garch(1, 1)
##  ARMA Order:                0 0
##  Max ARMA Order:            0
##  GARCH Order:               1 1
##  Max GARCH Order:           1
##  Maximum Order:             1
##  Conditional Dist:          sstd
##  h.start:                   2
##  llh.start:                 1
##  Length of Series:          2576
##  Recursion Init:            mci
##  Series Scale:              0.0146431753
## 
## Parameter Initialization:
##  Initial Parameters:          $params
##  Limits of Transformations:   $U, $V
##  Which Parameters are Fixed?  $includes
##  Parameter Matrix:
##                       U             V        params includes
##     mu     -0.117670784   0.117670784 -0.0117670784     TRUE
##     omega   0.000001000 100.000000000  0.1000000000     TRUE
##     alpha1  0.000000010   0.999999990  0.1000000000     TRUE
##     gamma1 -0.999999990   0.999999990  0.1000000000    FALSE
##     beta1   0.000000010   0.999999990  0.8000000000     TRUE
##     delta   0.000000000   2.000000000  2.0000000000    FALSE
##     skew    0.100000000  10.000000000  1.0000000000     TRUE
##     shape   1.000000000  10.000000000  4.0000000000     TRUE
##  Index List of Parameters to be Optimized:
##     mu  omega alpha1  beta1   skew  shape 
##      1      2      3      5      7      8 
##  Persistence:                  0.9 
## 
## 
## --- START OF TRACE ---
## Selected Algorithm: nlminb 
## 
## R coded nlminb Solver: 
## 
##   0:     3473.1134: -0.0117671 0.100000 0.100000 0.800000  1.00000  4.00000
##   1:     3456.6234: -0.0117674 0.112497 0.116303 0.817129  1.00177  4.00087
##   2:     3451.1516: -0.0117684 0.0883772 0.125572 0.816463  1.00878  4.00228
##   3:     3440.3939: -0.0117711 0.0594924 0.146805 0.852801  1.02458  4.00692
##   4:     3433.9752: -0.0117746 0.0204210 0.130484 0.880633  1.04090  4.01348
##   5:     3430.2552: -0.0117808 0.0148229 0.108533 0.919264  1.06876  4.02322
##   6:     3426.4696: -0.0117906 0.0190572 0.0826406 0.916498  1.11047  4.04420
##   7:     3422.4179: -0.0118009 0.0140989 0.0920492 0.920026  1.08285  4.08878
##   8:     3421.8112: -0.0118058 0.0106079 0.0822244 0.931336  1.07971  4.10748
##   9:     3421.2008: -0.0118120 0.0134890 0.0794336 0.927398  1.07769  4.13116
##  10:     3421.1851: -0.0118146 0.0159000 0.0798901 0.927085  1.07904  4.14015
##  11:     3420.8242: -0.0118158 0.0145393 0.0799693 0.926951  1.07905  4.14466
##  12:     3420.7224: -0.0118161 0.0126328 0.0809351 0.928449  1.07912  4.14556
##  13:     3420.6197: -0.0118182 0.0126620 0.0769050 0.930140  1.07986  4.15287
##  14:     3420.4152: -0.0118206 0.0132504 0.0774962 0.930494  1.08033  4.16136
##  15:     3409.4534: -0.0120779 0.0198340 0.100732 0.898121  1.13359  5.06209
##  16:     3405.3905: -0.0121717 0.0206254 0.0994449 0.895329  1.05802  5.38285
##  17:     3402.2965: -0.0124325 0.0170615 0.0748529 0.923736  1.06803  5.70942
##  18:     3401.0614: -0.0125156 0.0106594 0.0648111 0.926766  1.07081  5.81203
##  19:     3396.9735: -0.0129538 0.0115516 0.0646113 0.931979  1.07220  5.90858
##  20:     3396.0251: -0.0136502 0.00736353 0.0539441 0.944653  1.07685  6.15788
##  21:     3394.6546: -0.0154900 0.00493736 0.0604765 0.940501  1.07587  6.36147
##  22:     3392.3624: -0.0196513 0.00932084 0.0537193 0.936559  1.07863  6.96758
##  23:     3387.5192: -0.0246183 0.0121047 0.0637556 0.925747  1.07964  7.52803
##  24:     3382.9117: -0.0319741 0.0117461 0.0677171 0.922111  1.08038  8.86704
##  25:     3380.4854: -0.0300357 0.00815360 0.0660479 0.929307  1.08615  10.0000
##  26:     3380.3307: -0.0299194 0.00941670 0.0663094 0.926737  1.08420  10.0000
##  27:     3380.3204: -0.0289718 0.00955626 0.0657268 0.926979  1.08385  10.0000
##  28:     3380.3187: -0.0283029 0.00954085 0.0654048 0.927254  1.08390  10.0000
##  29:     3380.3186: -0.0282098 0.00951904 0.0653913 0.927298  1.08394  10.0000
##  30:     3380.3186: -0.0281953 0.00951464 0.0653985 0.927299  1.08395  10.0000
##  31:     3380.3186: -0.0281932 0.00951496 0.0654010 0.927296  1.08395  10.0000
## 
## Final Estimate of the Negative LLH:
##  LLH:  -7500.14097    norm LLH:  -2.91154541 
##                mu             omega            alpha1 
## -0.00041283825041  0.00000204022256  0.06540099338288 
##             beta1              skew             shape 
##  0.92729620349721  1.08395284122941 10.00000000000000 
## 
## R-optimhess Difference Approximated Hessian Matrix:
##                        mu               omega             alpha1
## mu      -18410609.2864353      1370101865.670      143767.465094
## omega  1370101865.6704085 -10110289518795.613 -1065611007.450038
## alpha1     143767.4650935     -1065611007.450     -159781.709719
## beta1      193801.2596236     -1346060827.070     -180384.810395
## skew        21622.0348696         1971731.318         583.545147
## shape          56.0626501         -825649.111        -111.707695
##                     beta1             skew             shape
## mu          193801.259624   21622.03486962      56.062650072
## omega  -1346060827.069787 1971731.31828064 -825649.111094690
## alpha1     -180384.810395     583.54514658    -111.707694657
## beta1      -219900.079695     530.05441424    -132.668237256
## skew           530.054414    -934.34917176       1.226022050
## shape         -132.668237       1.22602205      -0.637212679
## attr(,"time")
## Time difference of 0.0643370152 secs
## 
## --- END OF TRACE ---
## 
## 
## Time to Estimate Parameters:
##  Time difference of 0.188814163 secs
garch_NASDAQ_sstd <- garchFit(~garch(1, 1), data = log_returns_nasdaq$NASDAQ, cond.dist = "sstd")
## 
## Series Initialization:
##  ARMA Model:                arma
##  Formula Mean:              ~ arma(0, 0)
##  GARCH Model:               garch
##  Formula Variance:          ~ garch(1, 1)
##  ARMA Order:                0 0
##  Max ARMA Order:            0
##  GARCH Order:               1 1
##  Max GARCH Order:           1
##  Maximum Order:             1
##  Conditional Dist:          sstd
##  h.start:                   2
##  llh.start:                 1
##  Length of Series:          2576
##  Recursion Init:            mci
##  Series Scale:              0.0179973845
## 
## Parameter Initialization:
##  Initial Parameters:          $params
##  Limits of Transformations:   $U, $V
##  Which Parameters are Fixed?  $includes
##  Parameter Matrix:
##                       U             V        params includes
##     mu     -0.202102822   0.202102822 -0.0202102822     TRUE
##     omega   0.000001000 100.000000000  0.1000000000     TRUE
##     alpha1  0.000000010   0.999999990  0.1000000000     TRUE
##     gamma1 -0.999999990   0.999999990  0.1000000000    FALSE
##     beta1   0.000000010   0.999999990  0.8000000000     TRUE
##     delta   0.000000000   2.000000000  2.0000000000    FALSE
##     skew    0.100000000  10.000000000  1.0000000000     TRUE
##     shape   1.000000000  10.000000000  4.0000000000     TRUE
##  Index List of Parameters to be Optimized:
##     mu  omega alpha1  beta1   skew  shape 
##      1      2      3      5      7      8 
##  Persistence:                  0.9 
## 
## 
## --- START OF TRACE ---
## Selected Algorithm: nlminb 
## 
## R coded nlminb Solver: 
## 
##   0:     3223.6588: -0.0202103 0.100000 0.100000 0.800000  1.00000  4.00000
##   1:     3113.7901: -0.0202212 0.0147710 0.200358 0.815642  1.02470  4.00162
##   2:     3111.9347: -0.0202233 0.0313783 0.207648 0.828407  1.02713  4.00215
##   3:     3101.7809: -0.0202246 0.0205601 0.207150 0.826549  1.02908  4.00226
##   4:     3099.1852: -0.0202279 0.0118375 0.209064 0.831747  1.03326  4.00274
##   5:     3096.6097: -0.0202319 0.0152224 0.213896 0.839661  1.03843  4.00335
##   6:     3093.4399: -0.0202393 0.00977823 0.211402 0.842442  1.04737  4.00423
##   7:     3091.6475: -0.0202471 0.0117412 0.206055 0.846550  1.05598  4.00521
##   8:     3088.6661: -0.0202635 0.00958391 0.191050 0.852206  1.07117  4.00747
##   9:     3086.0740: -0.0202855 0.00498620 0.187493 0.870716  1.08129  4.01173
##  10:     3083.3286: -0.0203105 0.00795127 0.169069 0.871256  1.09242  4.01664
##  11:     3083.0362: -0.0203125 0.00799118 0.169410 0.874049  1.09409  4.01701
##  12:     3082.7267: -0.0203174 0.00619743 0.167343 0.875202  1.09470  4.01828
##  13:     3082.3835: -0.0203230 0.00689113 0.165252 0.877076  1.09517  4.01976
##  14:     3079.7245: -0.0204372 0.00437291 0.121526 0.904261  1.10095  4.05067
##  15:     3079.3175: -0.0206123 0.00589563 0.128544 0.907385  1.14029  4.09532
##  16:     3076.4515: -0.0206965 0.00284375 0.132797 0.900964  1.15832  4.11787
##  17:     3075.9142: -0.0207886 0.00672205 0.139929 0.891599  1.17097  4.14206
##  18:     3074.0067: -0.0212318 0.00466240 0.135209 0.899588  1.16354  4.26073
##  19:     3074.0040: -0.0212320 0.00289860 0.134434 0.898522  1.16356  4.26077
##  20:     3073.5468: -0.0212321 0.00398007 0.134486 0.898722  1.16356  4.26079
##  21:     3073.3942: -0.0212332 0.00494116 0.132965 0.897477  1.16363  4.26103
##  22:     3073.3511: -0.0212415 0.00463567 0.132823 0.897361  1.16358  4.26317
##  23:     3060.3718: -0.0269922 0.00499694 0.136274 0.876037  1.12303  5.74384
##  24:     3058.5292: -0.0280434 0.00702561 0.148853 0.866365  1.18058  5.93456
##  25:     3058.3377: -0.0296060 0.00827872 0.139630 0.875798  1.16746  5.91264
##  26:     3056.5015: -0.0310462 0.00610779 0.134637 0.878705  1.17395  5.95324
##  27:     3054.4229: -0.0391856 0.00466612 0.113243 0.895996  1.20628  6.18631
##  28:     3051.6169: -0.0640481 0.00317011 0.113506 0.895743  1.15105  6.57074
##  29:     3051.2410: -0.0539421 0.00405261 0.0841264 0.911144  1.12388  7.43925
##  30:     3046.2402: -0.0478230 0.00554103 0.0991353 0.898865  1.14118  8.69300
##  31:     3045.7330: -0.0439016 0.00153183 0.101435 0.904181  1.17111  9.92285
##  32:     3044.4617: -0.0574725 0.00510543 0.105853 0.894901  1.13738  10.0000
##  33:     3043.8656: -0.0513095 0.00406233 0.103462 0.898764  1.14742  10.0000
##  34:     3043.7908: -0.0510429 0.00328100 0.100508 0.902622  1.14870  10.0000
##  35:     3043.7816: -0.0512085 0.00345153 0.100361 0.902332  1.14785  10.0000
##  36:     3043.7810: -0.0513146 0.00343920 0.0999974 0.902620  1.14746  10.0000
##  37:     3043.7809: -0.0513497 0.00343280 0.0999021 0.902702  1.14736  10.0000
##  38:     3043.7809: -0.0513490 0.00343274 0.0999019 0.902703  1.14736  10.0000
## 
## Final Estimate of the Negative LLH:
##  LLH:  -7305.37334    norm LLH:  -2.83593686 
##               mu            omega           alpha1            beta1 
## -0.0009241480753  0.0000011118847  0.0999019402994  0.9027027692365 
##             skew            shape 
##  1.1473631675905 10.0000000000000 
## 
## R-optimhess Difference Approximated Hessian Matrix:
##                      mu               omega             alpha1
## mu      -21838402.01326      2219453751.386     175977.8917874
## omega  2219453751.38561 -11568301599418.266 -661533107.9115239
## alpha1     175977.89179      -661533107.912     -86041.0356302
## beta1      259227.42596      -919810283.981    -102552.3310522
## skew        27147.80402         1936525.332        516.3432768
## shape         125.23469         -601685.504        -75.2822697
##                     beta1             skew             shape
## mu         259227.4259607   27147.80401528     125.234689838
## omega  -919810283.9811141 1936525.33225076 -601685.503565683
## alpha1    -102552.3310522     516.34327684     -75.282269677
## beta1     -130282.0858389     426.83368991     -91.909894365
## skew          426.8336899    -969.80831051       1.060579677
## shape         -91.9098944       1.06057968      -0.475999968
## attr(,"time")
## Time difference of 0.0658669472 secs
## 
## --- END OF TRACE ---
## 
## 
## Time to Estimate Parameters:
##  Time difference of 0.218129158 secs
garch_nikkei_sstd <- garchFit(~garch(1, 1), data = log_returns_nikkei$NIKKEI, cond.dist = "sstd")
## 
## Series Initialization:
##  ARMA Model:                arma
##  Formula Mean:              ~ arma(0, 0)
##  GARCH Model:               garch
##  Formula Variance:          ~ garch(1, 1)
##  ARMA Order:                0 0
##  Max ARMA Order:            0
##  GARCH Order:               1 1
##  Max GARCH Order:           1
##  Maximum Order:             1
##  Conditional Dist:          sstd
##  h.start:                   2
##  llh.start:                 1
##  Length of Series:          2519
##  Recursion Init:            mci
##  Series Scale:              0.0148104327
## 
## Parameter Initialization:
##  Initial Parameters:          $params
##  Limits of Transformations:   $U, $V
##  Which Parameters are Fixed?  $includes
##  Parameter Matrix:
##                       U             V       params includes
##     mu     -0.109820691   0.109820691 0.0109820691     TRUE
##     omega   0.000001000 100.000000000 0.1000000000     TRUE
##     alpha1  0.000000010   0.999999990 0.1000000000     TRUE
##     gamma1 -0.999999990   0.999999990 0.1000000000    FALSE
##     beta1   0.000000010   0.999999990 0.8000000000     TRUE
##     delta   0.000000000   2.000000000 2.0000000000    FALSE
##     skew    0.100000000  10.000000000 1.0000000000     TRUE
##     shape   1.000000000  10.000000000 4.0000000000     TRUE
##  Index List of Parameters to be Optimized:
##     mu  omega alpha1  beta1   skew  shape 
##      1      2      3      5      7      8 
##  Persistence:                  0.9 
## 
## 
## --- START OF TRACE ---
## Selected Algorithm: nlminb 
## 
## R coded nlminb Solver: 
## 
##   0:     3458.8282: 0.0109821 0.100000 0.100000 0.800000  1.00000  4.00000
##   1:     3443.1465: 0.0109820 0.113959 0.114452 0.816953  1.00050  4.00073
##   2:     3434.8002: 0.0109806 0.0592623 0.146388 0.838044  1.00900  4.00638
##   3:     3434.3721: 0.0109806 0.0600469 0.146633 0.842009  1.00923  4.00672
##   4:     3433.9715: 0.0109805 0.0568951 0.144246 0.842864  1.00950  4.00707
##   5:     3433.1785: 0.0109803 0.0546090 0.143082 0.850495  1.01022  4.00802
##   6:     3431.3662: 0.0109798 0.0424814 0.138001 0.859738  1.01182  4.01004
##   7:     3429.4043: 0.0109785 0.0252511 0.132523 0.885844  1.01575  4.01604
##   8:     3428.8066: 0.0109785 0.0238344 0.130528 0.884275  1.01576  4.01611
##   9:     3428.2495: 0.0109783 0.0275618 0.125026 0.887469  1.01590  4.01682
##  10:     3427.2590: 0.0109760 0.0249661 0.119502 0.888526  1.02217  4.02872
##  11:     3425.1821: 0.0109651 0.00775887 0.0967806 0.919515 0.994714  4.11524
##  12:     3424.6814: 0.0109651 0.00998431 0.0966147 0.920187 0.994875  4.11531
##  13:     3424.3020: 0.0109650 0.00989816 0.0948546 0.918727 0.995147  4.11570
##  14:     3423.8427: 0.0109643 0.0115179 0.0938639 0.919977 0.996549  4.11954
##  15:     3423.2978: 0.0109627 0.0110254 0.0907677 0.920285 0.999763  4.12773
##  16:     3422.5687: 0.0109595 0.0116982 0.0872546 0.923689  1.00563  4.14478
##  17:     3421.5951: 0.0109533 0.0107337 0.0805567 0.926601  1.00811  4.18136
##  18:     3420.8837: 0.0109475 0.0129750 0.0776887 0.927752 0.998032  4.21717
##  19:     3419.3645: 0.0109232 0.00950133 0.0696985 0.935878  1.02405  4.32362
##  20:     3419.3453: 0.0109232 0.00937309 0.0689918 0.935045  1.02400  4.32368
##  21:     3419.2349: 0.0109232 0.00985450 0.0691212 0.935278  1.02398  4.32370
##  22:     3419.1490: 0.0109232 0.0108067 0.0686934 0.934965  1.02387  4.32382
##  23:     3419.1047: 0.0109228 0.0107119 0.0682920 0.934592  1.02377  4.32595
##  24:     3419.0522: 0.0109223 0.0110191 0.0682877 0.934748  1.02369  4.32813
##  25:     3414.8758: 0.0107801 0.00858884 0.0485524 0.945749  1.00668  4.98388
##  26:     3408.6962: 0.0106065 0.0144451 0.0700865 0.922638  1.01676  5.63932
##  27:     3408.2199: 0.0102034 0.0151041 0.0747420 0.915514  1.01928  5.88862
##  28:     3405.4927: 0.00493074 0.0169243 0.0702205 0.916078  1.01201  6.76603
##  29:     3404.4935: 0.00130955 0.0144987 0.0659206 0.921599  1.01004  7.38896
##  30:     3404.1954: -0.000255938 0.0133476 0.0632441 0.924750  1.01405  7.88123
##  31:     3404.1324: -0.000228772 0.0133896 0.0636464 0.924287  1.01286  8.12458
##  32:     3404.1138: 0.000491172 0.0138875 0.0643135 0.923030  1.01365  8.29683
##  33:     3404.1123: 0.000952049 0.0138307 0.0645565 0.922896  1.01311  8.32046
##  34:     3404.1120: 0.00120493 0.0138546 0.0645203 0.922903  1.01335  8.31826
##  35:     3404.1119: 0.00137308 0.0138477 0.0644732 0.922945  1.01345  8.31274
##  36:     3404.1119: 0.00137490 0.0138448 0.0644683 0.922956  1.01345  8.31165
##  37:     3404.1119: 0.00137365 0.0138447 0.0644687 0.922956  1.01344  8.31176
## 
## Final Estimate of the Negative LLH:
##  LLH:  -7206.98271    norm LLH:  -2.86104911 
##               mu            omega           alpha1            beta1 
## 0.00002034438051 0.00000303680925 0.06446868333045 0.92295553109190 
##             skew            shape 
## 1.01344470541039 8.31175523792201 
## 
## R-optimhess Difference Approximated Hessian Matrix:
##                       mu              omega            alpha1
## mu     -16209770.1245553      182726128.384      13014.590040
## omega  182726128.3836664 -6111998955230.102 -773571475.442117
## alpha1     13014.5900396     -773571475.442    -135229.390309
## beta1      25387.4175794     -974162175.859    -150827.933256
## skew       36542.1302604       -3118414.474       -170.980854
## shape        -57.8172101        -843162.081       -139.554917
##                    beta1               skew             shape
## mu          25387.417579    36542.130260447     -57.817210136
## omega  -974162175.858694 -3118414.474487743 -843162.080809816
## alpha1    -150827.933256     -170.980853909    -139.554916700
## beta1     -181256.923570     -345.282780643    -167.707932917
## skew         -345.282781    -1331.570890898       0.238565462
## shape        -167.707933        0.238565462      -0.817103569
## attr(,"time")
## Time difference of 0.0670919418 secs
## 
## --- END OF TRACE ---
## 
## 
## Time to Estimate Parameters:
##  Time difference of 0.21694994 secs

# After fitting the models, evaluate the residuals:
# For standardized t-distribution
residuals_sp500_std <- residuals(garch_sp500_std, standardize = TRUE)
residuals_cac40_std <- residuals(garch_cac40_std, standardize = TRUE)
residuals_NASDAQ_std <- residuals(garch_NASDAQ_std, standardize = TRUE)
residuals_nikkei_std <- residuals(garch_nikkei_std, standardize = TRUE)

# For skewed t-distribution
residuals_sp500_sstd <- residuals(garch_sp500_sstd, standardize = TRUE)
residuals_cac40_sstd <- residuals(garch_cac40_sstd, standardize = TRUE)
residuals_NASDAQ_sstd <- residuals(garch_NASDAQ_sstd, standardize = TRUE)
residuals_nikkei_sstd <- residuals(garch_nikkei_sstd, standardize = TRUE)

# Plot the residuals and their ACF to assess quality
# Using plot
par(mfrow = c(2, 2))
plot(residuals_sp500_std, main="SP500")
plot(residuals_cac40_std, main="CAC40")
plot(residuals_NASDAQ_std, main="NASDAQ")
plot(residuals_nikkei_std, main="NIKKEI")
mtext("Residuals of Garch Model with standardized t-distribution", outer = TRUE, cex = 1, line = -1)

Code
par(mfrow = c(2, 2))
acf(residuals_sp500_std^2, main="SP500")
acf(residuals_cac40_std^2, main="CAC40")
acf(residuals_NASDAQ_std^2, main="NASDAQ")
acf(residuals_nikkei_std^2, main="NIKKEI")
mtext("ACF of Squared Residuals with standardized t-distribution", outer = TRUE, cex = 1, line = -1)

Code

# For skewed distribution
par(mfrow = c(2, 2))
plot(residuals_sp500_sstd, main="Residuals of SP500 GARCH Model with sstd t-dist")
plot(residuals_cac40_sstd, main="Residuals of CAC40 GARCH Model with sstd t-dist")
plot(residuals_NASDAQ_sstd, main="Residuals of NASDAQ GARCH Model with sstd t-dist")
plot(residuals_nikkei_sstd, main="Residuals of NIKKEI GARCH Model with sstd t-dist")
mtext("Residuals of Garch Model with skewed t-distribution", outer = TRUE, cex = 1, line = -1)

Code
par(mfrow = c(2, 2))
acf(residuals_sp500_sstd^2, main="SP500")
acf(residuals_cac40_sstd^2, main="CAC40")
acf(residuals_NASDAQ_sstd^2, main="NASDAQ")
acf(residuals_nikkei_sstd^2, main="NIKKEI")
mtext("ACF of Squared Residuals with skewed t-distribution", outer = TRUE, cex = 1, line = -1)

Code

# Perform Ljung-Box tests and collect the p-values
p_values_squared_residuals <- c(
  Box.test(residuals_sp500_std^2, lag = 20, type = "Ljung-Box")$p.value,
  Box.test(residuals_sp500_sstd^2, lag = 20, type = "Ljung-Box")$p.value,
  Box.test(residuals_cac40_std^2, lag = 20, type = "Ljung-Box")$p.value,
  Box.test(residuals_cac40_sstd^2, lag = 20, type = "Ljung-Box")$p.value,
  Box.test(residuals_NASDAQ_std^2, lag = 20, type = "Ljung-Box")$p.value,
  Box.test(residuals_NASDAQ_sstd^2, lag = 20, type = "Ljung-Box")$p.value,
  Box.test(residuals_nikkei_std^2, lag = 20, type = "Ljung-Box")$p.value,
  Box.test(residuals_nikkei_sstd^2, lag = 20, type = "Ljung-Box")$p.value
)

# Create a data frame with the results
ljung_box_results <- data.frame(
  Index = rep(c("SP500", "CAC40", "NASDAQ", "NIKKEI"), each = 2),
  Distribution = rep(c("STD", "SSTD"), times = 4),
  P_Value = p_values_squared_residuals
)

# Create a kable
kable_output <- kable(ljung_box_results, format = "html", caption = "Ljung-Box Test Results on Squared Residuals") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = F)

# Print the kable to the R console (note: this will not render as HTML in the console)
kable_output
Ljung-Box Test Results on Squared Residuals
Index Distribution P_Value
SP500 STD 0.751176895
SP500 SSTD 0.767092017
CAC40 STD 0.784395624
CAC40 SSTD 0.800149964
NASDAQ STD 0.277930731
NASDAQ SSTD 0.313847177
NIKKEI STD 0.996172310
NIKKEI SSTD 0.996168168

Ho: data are independently distributed, implying that there is no autocorrelation among the values. H1: data exhibit some level of autocorrelation, meaning that past values in the series have a statistically significant influence on future values.

The quality of the fit is very poor as the null hypothesis is rejected, implying that the negative log return exhibit some level of autocorrelation.

Code
# Determining if the skewed t distribution is better than the standard t distribution
boxplot(residuals_sp500_std - residuals_sp500_sstd)

Code
boxplot(residuals_cac40_std - residuals_cac40_sstd)

Code
boxplot(residuals_NASDAQ_std - residuals_NASDAQ_sstd)

Code
boxplot(residuals_nikkei_std - residuals_nikkei_sstd)

As we can see from these boxplots, the standard t distribution and the skewed t distribution are slightly different, but the difference is so minimal that even if the skewness is a significant parameter, it does not change the results in a significant manner. Therefore, we will proceed with the simpler model, which is the standard t distribution.

1.2.7 (g) Residual serial correlation can be present when fitting a GARCH directly on the negative log returns.

Code
# Step 2: Fit a GARCH(1,1) model on the residuals of the ARMA model
garch_sp500_std <- garchFit(~garch(1, 1), data = arma_sp500_residuals, cond.dist = "std")
## 
## Series Initialization:
##  ARMA Model:                arma
##  Formula Mean:              ~ arma(0, 0)
##  GARCH Model:               garch
##  Formula Variance:          ~ garch(1, 1)
##  ARMA Order:                0 0
##  Max ARMA Order:            0
##  GARCH Order:               1 1
##  Max GARCH Order:           1
##  Maximum Order:             1
##  Conditional Dist:          std
##  h.start:                   2
##  llh.start:                 1
##  Length of Series:          3587
##  Recursion Init:            mci
##  Series Scale:              0.0104796183
## 
## Parameter Initialization:
##  Initial Parameters:          $params
##  Limits of Transformations:   $U, $V
##  Which Parameters are Fixed?  $includes
##  Parameter Matrix:
##                         U               V         params includes
##     mu     -0.00214894176   0.00214894176 0.000214894176     TRUE
##     omega   0.00000100000 100.00000000000 0.100000000000     TRUE
##     alpha1  0.00000001000   0.99999999000 0.100000000000     TRUE
##     gamma1 -0.99999999000   0.99999999000 0.100000000000    FALSE
##     beta1   0.00000001000   0.99999999000 0.800000000000     TRUE
##     delta   0.00000000000   2.00000000000 2.000000000000    FALSE
##     skew    0.10000000000  10.00000000000 1.000000000000    FALSE
##     shape   1.00000000000  10.00000000000 4.000000000000     TRUE
##  Index List of Parameters to be Optimized:
##     mu  omega alpha1  beta1  shape 
##      1      2      3      5      8 
##  Persistence:                  0.9 
## 
## 
## --- START OF TRACE ---
## Selected Algorithm: nlminb 
## 
## R coded nlminb Solver: 
## 
##   0:     4667.5075: 0.000214894 0.100000 0.100000 0.800000  4.00000
##   1:     4650.9741: 0.000214894 0.0958297 0.126787 0.816473  4.00075
##   2:     4637.4409: 0.000214894 0.0656361 0.129885 0.807221  4.00079
##   3:     4601.7197: 0.000214893 0.0349491 0.174636 0.840080  4.00262
##   4:     4599.5242: 0.000214893 0.0276214 0.172495 0.839698  4.00273
##   5:     4597.5768: 0.000214893 0.0280450 0.173355 0.847275  4.00306
##   6:     4593.5124: 0.000214892 0.0171943 0.167827 0.856497  4.00371
##   7:     4589.7459: 0.000214891 0.0162512 0.161430 0.870321  4.00462
##   8:     4581.8304: 0.000214889 0.00870456 0.135834 0.889866  4.00722
##   9:     4577.0941: 0.000214887 0.0115466 0.115834 0.901044  4.00966
##  10:     4572.7375: 0.000214885 0.00571658 0.0994963 0.915999  4.01345
##  11:     4571.4722: 0.000214884 0.00530996 0.0957546 0.925350  4.01543
##  12:     4569.1731: 0.000214882 0.00282339 0.0880839 0.929931  4.01985
##  13:     4567.5511: 0.000214879 0.00535335 0.0802951 0.932169  4.02564
##  14:     4564.8847: 0.000214873 0.00221371 0.0640911 0.947996  4.04017
##  15:     4564.8214: 0.000214873 0.00217124 0.0637911 0.947741  4.04019
##  16:     4564.7858: 0.000214873 0.00255272 0.0636959 0.947785  4.04021
##  17:     4564.7291: 0.000214872 0.00240984 0.0634476 0.947667  4.04094
##  18:     4564.6734: 0.000214872 0.00263467 0.0632274 0.947793  4.04249
##  19:     4562.6466: 0.000214823 0.00190468 0.0550373 0.953487  4.14344
##  20:     4556.3365: 0.000214365 0.00249583 0.0709092 0.936308  4.89451
##  21:     4555.6225: 0.000213531 0.00616376 0.0670444 0.935289  5.02155
##  22:     4553.4399: 0.000212085 0.00382102 0.0630895 0.941740  5.14833
##  23:     4552.9415: 0.000212085 0.00323836 0.0627866 0.941427  5.14833
##  24:     4552.9137: 0.000212077 0.00317763 0.0624390 0.941434  5.14897
##  25:     4552.8795: 0.000212060 0.00331473 0.0623766 0.941566  5.15041
##  26:     4552.8346: 0.000212026 0.00322441 0.0620267 0.941679  5.15329
##  27:     4550.4119: 0.000203458 0.00113701 0.0412723 0.959094  5.87395
##  28:     4548.4433: 0.000193158 0.00367443 0.0502032 0.948621  6.23247
##  29:     4547.8495: 0.000181692 0.00302808 0.0522070 0.947388  6.49152
##  30:     4547.5783: 0.000160476 0.00272620 0.0493366 0.949172  6.89645
##  31:     4547.5251: 0.000149469 0.00258311 0.0489630 0.950087  6.93605
##  32:     4547.5143: 0.000138170 0.00270783 0.0489239 0.949896  6.97701
##  33:     4547.5104: 0.000121549 0.00273891 0.0489957 0.949742  7.01722
##  34:     4547.5061: 9.14603e-05 0.00276124 0.0491298 0.949562  7.05013
##  35:     4547.4954: 3.09342e-06 0.00279303 0.0493514 0.949277  7.10205
##  36:     4547.4698: -0.000226978 0.00283901 0.0496817 0.948855  7.17990
##  37:     4547.4031: -0.000868114 0.00291456 0.0502281 0.948159  7.31197
##  38:     4547.1936: -0.00214894 0.00292368 0.0504612 0.947917  7.35904
##  39:     4547.0979: -0.00214894 0.00278310 0.0496271 0.949043  7.12585
##  40:     4547.0844: -0.00214894 0.00272277 0.0491497 0.949646  6.98638
##  41:     4547.0841: -0.00214894 0.00272951 0.0491619 0.949617  7.00444
##  42:     4547.0841: -0.00214894 0.00272971 0.0491589 0.949619  7.00372
##  43:     4547.0841: -0.00214894 0.00272977 0.0491586 0.949619  7.00371
## 
## Final Estimate of the Negative LLH:
##  LLH:  -11803.6205    norm LLH:  -3.29066645 
##                 mu              omega             alpha1 
## -0.000022520089460  0.000000299790366  0.049158614498107 
##              beta1              shape 
##  0.949619108119116  7.003710175246678 
## 
## R-optimhess Difference Approximated Hessian Matrix:
##                       mu               omega             alpha1
## mu      -63710506.851443       2995532287.73       83767.117168
## omega  2995532287.726589 -180095662808472.34 -6648328064.890606
## alpha1      83767.117168      -6648328064.89     -428089.923653
## beta1      185990.221377      -8310474688.76     -483094.129648
## shape         331.684771         -7241370.47        -489.699711
##                     beta1             shape
## mu          185990.221377      331.68477149
## omega  -8310474688.760022 -7241370.47098648
## alpha1     -483094.129648     -489.69971090
## beta1      -572473.283785     -568.05861266
## shape         -568.058613       -2.18079981
## attr(,"time")
## Time difference of 0.0348360538 secs
## 
## --- END OF TRACE ---
## 
## 
## Time to Estimate Parameters:
##  Time difference of 0.146663189 secs
garch_cac40_std <- garchFit(~garch(1, 1), data = arma_cac40_residuals, cond.dist = "std")
## 
## Series Initialization:
##  ARMA Model:                arma
##  Formula Mean:              ~ arma(0, 0)
##  GARCH Model:               garch
##  Formula Variance:          ~ garch(1, 1)
##  ARMA Order:                0 0
##  Max ARMA Order:            0
##  GARCH Order:               1 1
##  Max GARCH Order:           1
##  Maximum Order:             1
##  Conditional Dist:          std
##  h.start:                   2
##  llh.start:                 1
##  Length of Series:          2576
##  Recursion Init:            mci
##  Series Scale:              0.0146431753
## 
## Parameter Initialization:
##  Initial Parameters:          $params
##  Limits of Transformations:   $U, $V
##  Which Parameters are Fixed?  $includes
##  Parameter Matrix:
##                       U             V        params includes
##     mu     -0.117670784   0.117670784 -0.0117670784     TRUE
##     omega   0.000001000 100.000000000  0.1000000000     TRUE
##     alpha1  0.000000010   0.999999990  0.1000000000     TRUE
##     gamma1 -0.999999990   0.999999990  0.1000000000    FALSE
##     beta1   0.000000010   0.999999990  0.8000000000     TRUE
##     delta   0.000000000   2.000000000  2.0000000000    FALSE
##     skew    0.100000000  10.000000000  1.0000000000    FALSE
##     shape   1.000000000  10.000000000  4.0000000000     TRUE
##  Index List of Parameters to be Optimized:
##     mu  omega alpha1  beta1  shape 
##      1      2      3      5      8 
##  Persistence:                  0.9 
## 
## 
## --- START OF TRACE ---
## Selected Algorithm: nlminb 
## 
## R coded nlminb Solver: 
## 
##   0:     3473.1134: -0.0117671 0.100000 0.100000 0.800000  4.00000
##   1:     3456.7666: -0.0117674 0.112453 0.116246 0.817069  4.00086
##   2:     3451.4493: -0.0117685 0.0878698 0.126467 0.817068  4.00237
##   3:     3441.8585: -0.0117716 0.0571051 0.147142 0.855099  4.00726
##   4:     3437.5349: -0.0117766 0.0162732 0.130676 0.884361  4.01429
##   5:     3435.4808: -0.0117814 0.0193791 0.124148 0.906139  4.01993
##   6:     3429.0853: -0.0117939 0.0203766 0.107791 0.903403  4.03670
##   7:     3427.4583: -0.0118006 0.0132761 0.0940258 0.918867  4.04560
##   8:     3427.2636: -0.0118007 0.0140768 0.0942408 0.919550  4.04567
##   9:     3427.1700: -0.0118012 0.0143856 0.0918127 0.919680  4.04644
##  10:     3426.9224: -0.0118044 0.0150911 0.0900837 0.921493  4.05086
##  11:     3425.6804: -0.0118348 0.0142996 0.0783797 0.927959  4.09386
##  12:     3424.7120: -0.0118670 0.0137842 0.0795825 0.929260  4.13886
##  13:     3420.9242: -0.0120300 0.00795408 0.0829204 0.928359  4.36641
##  14:     3416.9718: -0.0122332 0.0149677 0.0917629 0.915763  4.59321
##  15:     3413.6532: -0.0126697 0.0154080 0.0858640 0.913685  4.81816
##  16:     3409.4590: -0.0135515 0.0113872 0.0692442 0.931006  5.03219
##  17:     3405.0575: -0.0159261 0.00772165 0.0808620 0.925919  5.48763
##  18:     3394.5979: -0.0257942 0.0128761 0.0723392 0.918469  6.80208
##  19:     3393.7386: -0.0384438 0.0239538 0.0832708 0.901997  7.93129
##  20:     3386.5788: -0.0485677 0.0153625 0.0680843 0.916751  9.23148
##  21:     3385.2923: -0.0503223 0.0134750 0.0700083 0.918288  9.63869
##  22:     3384.6830: -0.0495229 0.00977438 0.0693476 0.923064  9.80355
##  23:     3383.9296: -0.0421525 0.00957911 0.0649925 0.927826  10.0000
##  24:     3383.8519: -0.0363361 0.00966461 0.0645899 0.927963  10.0000
##  25:     3383.8501: -0.0359327 0.00972077 0.0649489 0.927534  10.0000
##  26:     3383.8500: -0.0359438 0.00972675 0.0649086 0.927525  10.0000
##  27:     3383.8500: -0.0359730 0.00973794 0.0649280 0.927496  10.0000
##  28:     3383.8500: -0.0359738 0.00973574 0.0649255 0.927501  10.0000
## 
## Final Estimate of the Negative LLH:
##  LLH:  -7496.6096    norm LLH:  -2.91017453 
##                mu             omega            alpha1 
## -0.00052677030655  0.00000208756159  0.06492546081799 
##             beta1             shape 
##  0.92750085138243 10.00000000000000 
## 
## R-optimhess Difference Approximated Hessian Matrix:
##                      mu              omega             alpha1
## mu     -18260599.174983      303929886.979       16825.252897
## omega  303929886.979339 -9976307385770.393 -1056039821.980164
## alpha1     16825.252897    -1056039821.980     -158949.129072
## beta1      39412.875452    -1335523507.201     -179444.533131
## shape        133.818147        -797872.525        -108.933698
##                     beta1             shape
## mu           39412.875452     133.818147445
## omega  -1335523507.201395 -797872.524684289
## alpha1     -179444.533131    -108.933697788
## beta1      -218738.302210    -129.427424497
## shape         -129.427424      -0.618274044
## attr(,"time")
## Time difference of 0.0219061375 secs
## 
## --- END OF TRACE ---
## 
## 
## Time to Estimate Parameters:
##  Time difference of 0.0720720291 secs
garch_NASDAQ_std <- garchFit(~garch(1, 1), data = arma_NASDAQ_residuals, cond.dist = "std")
## 
## Series Initialization:
##  ARMA Model:                arma
##  Formula Mean:              ~ arma(0, 0)
##  GARCH Model:               garch
##  Formula Variance:          ~ garch(1, 1)
##  ARMA Order:                0 0
##  Max ARMA Order:            0
##  GARCH Order:               1 1
##  Max GARCH Order:           1
##  Maximum Order:             1
##  Conditional Dist:          std
##  h.start:                   2
##  llh.start:                 1
##  Length of Series:          2576
##  Recursion Init:            mci
##  Series Scale:              0.0179380315
## 
## Parameter Initialization:
##  Initial Parameters:          $params
##  Limits of Transformations:   $U, $V
##  Which Parameters are Fixed?  $includes
##  Parameter Matrix:
##                       U             V        params includes
##     mu     -0.200050292   0.200050292 -0.0200050292     TRUE
##     omega   0.000001000 100.000000000  0.1000000000     TRUE
##     alpha1  0.000000010   0.999999990  0.1000000000     TRUE
##     gamma1 -0.999999990   0.999999990  0.1000000000    FALSE
##     beta1   0.000000010   0.999999990  0.8000000000     TRUE
##     delta   0.000000000   2.000000000  2.0000000000    FALSE
##     skew    0.100000000  10.000000000  1.0000000000    FALSE
##     shape   1.000000000  10.000000000  4.0000000000     TRUE
##  Index List of Parameters to be Optimized:
##     mu  omega alpha1  beta1  shape 
##      1      2      3      5      8 
##  Persistence:                  0.9 
## 
## 
## --- START OF TRACE ---
## Selected Algorithm: nlminb 
## 
## R coded nlminb Solver: 
## 
##   0:     3226.3064: -0.0200050 0.100000 0.100000 0.800000  4.00000
##   1:     3134.0812: -0.0200124 0.0399611 0.170388 0.811242  4.00111
##   2:     3113.2292: -0.0200286 1.00000e-06 0.229477 0.865777  4.00433
##   3:     3105.2197: -0.0200324 0.00634282 0.218919 0.852014  4.00440
##   4:     3104.8534: -0.0200343 0.00590239 0.216434 0.850601  4.00456
##   5:     3104.5671: -0.0200369 0.00830805 0.214846 0.850475  4.00481
##   6:     3104.1991: -0.0200413 0.00757890 0.212390 0.849196  4.00520
##   7:     3103.9610: -0.0200484 0.00905805 0.210103 0.849784  4.00591
##   8:     3099.0136: -0.0203139 0.00276342 0.164417 0.887222  4.03297
##   9:     3098.2098: -0.0205838 0.00839727 0.108670 0.905556  4.06089
##  10:     3097.3118: -0.0207164 1.00000e-06 0.113357 0.923770  4.07704
##  11:     3096.2445: -0.0207171 0.00330134 0.111566 0.921090  4.07710
##  12:     3094.9475: -0.0207317 0.00115290 0.107977 0.920382  4.07877
##  13:     3094.0253: -0.0207549 0.00289392 0.105167 0.921444  4.08157
##  14:     3093.8365: -0.0207835 0.00274417 0.102797 0.921816  4.08524
##  15:     3088.4241: -0.0237975 0.00605631 0.0915801 0.918494  4.49620
##  16:     3081.6222: -0.0268158 0.00370856 0.106293 0.905891  4.90679
##  17:     3078.5012: -0.0301102 0.00765725 0.120748 0.893686  5.31205
##  18:     3074.0463: -0.0341088 0.00577597 0.130457 0.881718  5.70133
##  19:     3064.5326: -0.0603700 0.00536242 0.108311 0.892783  6.86056
##  20:     3063.8489: -0.0659457 0.00699199 0.124472 0.878876  7.31597
##  21:     3061.2364: -0.0694491 0.00497512 0.109945 0.893146  7.82041
##  22:     3059.5420: -0.0582390 0.00318174 0.0886041 0.910286  9.20867
##  23:     3058.5455: -0.0625835 0.00266304 0.0916109 0.912687  9.56274
##  24:     3058.3557: -0.0618580 0.00304937 0.0925515 0.910515  9.71448
##  25:     3058.2129: -0.0617547 0.00320802 0.0930590 0.909055  10.0000
##  26:     3058.2104: -0.0617692 0.00327325 0.0932450 0.909000  10.0000
##  27:     3058.2098: -0.0618668 0.00323159 0.0932270 0.909068  10.0000
##  28:     3058.2098: -0.0618745 0.00322665 0.0931657 0.909129  10.0000
##  29:     3058.2098: -0.0618715 0.00322658 0.0931736 0.909124  10.0000
##  30:     3058.2098: -0.0618715 0.00322664 0.0931732 0.909124  10.0000
## 
## Final Estimate of the Negative LLH:
##  LLH:  -7299.45387    norm LLH:  -2.83363892 
##                mu             omega            alpha1 
## -0.00110985352969  0.00000103824621  0.09317322282767 
##             beta1             shape 
##  0.90912439839470 10.00000000000000 
## 
## R-optimhess Difference Approximated Hessian Matrix:
##                      mu               omega            alpha1
## mu     -21314830.528352       339252070.093      29330.453520
## omega  339252070.093101 -12712958920886.352 -733167857.875236
## alpha1     29330.453520      -733167857.875     -96253.537685
## beta1      69166.316267     -1011745238.093    -114434.572966
## shape        297.725389         -556260.064        -75.333673
##                      beta1             shape
## mu           69166.3162671     297.725389197
## omega  -1011745238.0934393 -556260.064412848
## alpha1     -114434.5729658     -75.333672953
## beta1      -144410.3701562     -92.141900836
## shape          -92.1419008      -0.436097253
## attr(,"time")
## Time difference of 0.0229189396 secs
## 
## --- END OF TRACE ---
## 
## 
## Time to Estimate Parameters:
##  Time difference of 0.0806369781 secs
garch_nikkei_std <- garchFit(~garch(1, 1), data = arma_nikkei_residuals, cond.dist = "std")
## 
## Series Initialization:
##  ARMA Model:                arma
##  Formula Mean:              ~ arma(0, 0)
##  GARCH Model:               garch
##  Formula Variance:          ~ garch(1, 1)
##  ARMA Order:                0 0
##  Max ARMA Order:            0
##  GARCH Order:               1 1
##  Max GARCH Order:           1
##  Maximum Order:             1
##  Conditional Dist:          std
##  h.start:                   2
##  llh.start:                 1
##  Length of Series:          2519
##  Recursion Init:            mci
##  Series Scale:              0.0147693665
## 
## Parameter Initialization:
##  Initial Parameters:          $params
##  Limits of Transformations:   $U, $V
##  Which Parameters are Fixed?  $includes
##  Parameter Matrix:
##                       U             V       params includes
##     mu     -0.118032283   0.118032283 0.0118032283     TRUE
##     omega   0.000001000 100.000000000 0.1000000000     TRUE
##     alpha1  0.000000010   0.999999990 0.1000000000     TRUE
##     gamma1 -0.999999990   0.999999990 0.1000000000    FALSE
##     beta1   0.000000010   0.999999990 0.8000000000     TRUE
##     delta   0.000000000   2.000000000 2.0000000000    FALSE
##     skew    0.100000000  10.000000000 1.0000000000    FALSE
##     shape   1.000000000  10.000000000 4.0000000000     TRUE
##  Index List of Parameters to be Optimized:
##     mu  omega alpha1  beta1  shape 
##      1      2      3      5      8 
##  Persistence:                  0.9 
## 
## 
## --- START OF TRACE ---
## Selected Algorithm: nlminb 
## 
## R coded nlminb Solver: 
## 
##   0:     3468.5376: 0.0118032 0.100000 0.100000 0.800000  4.00000
##   1:     3451.9020: 0.0118031 0.114481 0.114714 0.817404  4.00075
##   2:     3443.3529: 0.0118009 0.0515337 0.150940 0.843866  4.00769
##   3:     3442.9739: 0.0118008 0.0525652 0.150431 0.847625  4.00803
##   4:     3442.6077: 0.0118007 0.0499710 0.147494 0.847966  4.00834
##   5:     3441.7988: 0.0118003 0.0479726 0.145619 0.855295  4.00936
##   6:     3440.2577: 0.0117994 0.0367880 0.139830 0.864543  4.01157
##   7:     3439.2028: 0.0117964 0.0210496 0.134021 0.890150  4.01925
##   8:     3438.6335: 0.0117858 0.0159138 0.123347 0.890522  4.04849
##   9:     3435.4234: 0.0117743 0.0255583 0.118423 0.890436  4.07812
##  10:     3434.3584: 0.0117627 0.0253332 0.109716 0.892338  4.10838
##  11:     3430.4972: 0.0117077 0.00763983 0.0855599 0.926762  4.24382
##  12:     3430.1907: 0.0117077 0.00987285 0.0849850 0.927055  4.24391
##  13:     3429.8342: 0.0117070 0.00944723 0.0837634 0.926010  4.24553
##  14:     3429.5670: 0.0117052 0.0108491 0.0831217 0.926513  4.24989
##  15:     3429.2397: 0.0117013 0.0106630 0.0815530 0.926087  4.25904
##  16:     3428.7970: 0.0116931 0.0116414 0.0801755 0.927109  4.27753
##  17:     3424.8980: 0.0115400 0.0105791 0.0605005 0.936757  4.60805
##  18:     3421.5449: 0.0113132 0.0141715 0.0729510 0.924790  4.93852
##  19:     3420.2870: 0.0108041 0.0147280 0.0803982 0.914071  5.26694
##  20:     3417.4079: 0.00490777 0.0112195 0.0515887 0.936003  6.14792
##  21:     3414.1980: -0.000132749 0.0171064 0.0670902 0.920178  6.52477
##  22:     3413.2239: -0.00265596 0.0131168 0.0664864 0.924317  6.95927
##  23:     3412.4799: -0.00507515 0.0126877 0.0612893 0.927343  7.73549
##  24:     3412.3602: -0.00377699 0.0127658 0.0601994 0.928200  7.98986
##  25:     3412.3050: -0.00159957 0.0128904 0.0609818 0.927325  8.24421
##  26:     3412.3024: -0.00107230 0.0130784 0.0613583 0.926785  8.29818
##  27:     3412.3024: -0.00104717 0.0130405 0.0613783 0.926808  8.30380
##  28:     3412.3024: -0.00106111 0.0130496 0.0613788 0.926798  8.30351
##  29:     3412.3024: -0.00106861 0.0130504 0.0613795 0.926797  8.30370
## 
## Final Estimate of the Negative LLH:
##  LLH:  -7205.7866    norm LLH:  -2.86057428 
##                mu             omega            alpha1 
## -0.00001578276056  0.00000284674269  0.06137948930882 
##             beta1             shape 
##  0.92679655857120  8.30370089007517 
## 
## R-optimhess Difference Approximated Hessian Matrix:
##                       mu              omega            alpha1
## mu     -16015181.2452220       61915652.951      -2497.047784
## omega   61915652.9508844 -6700682118294.088 -856915733.981776
## alpha1     -2497.0477840     -856915733.982    -150630.052764
## beta1       5504.9083991    -1072045992.470    -167394.005688
## shape        -18.3997416        -884074.589       -149.003828
##                     beta1             shape
## mu            5504.908399     -18.399741572
## omega  -1072045992.469797 -884074.588896993
## alpha1     -167394.005688    -149.003827858
## beta1      -199658.086337    -177.586622556
## shape         -177.586623      -0.831361376
## attr(,"time")
## Time difference of 0.0246810913 secs
## 
## --- END OF TRACE ---
## 
## 
## Time to Estimate Parameters:
##  Time difference of 0.0750880241 secs
Code
# Step 3: Assess the quality of the fit
residuals_sp500_garch <- residuals(garch_sp500_std, standardize = TRUE)
residuals_cac40_garch <- residuals(garch_cac40_std, standardize = TRUE)
residuals_NASDAQ_garch <- residuals(garch_NASDAQ_std, standardize = TRUE)
residuals_nikkei_garch <- residuals(garch_nikkei_std, standardize = TRUE)

par(mfrow = c(2, 2))
plot(residuals_sp500_garch, main="SP500 residuals with std t-dist")
plot(residuals_cac40_garch, main="CAC40 residuals with std t-dist")
plot(residuals_NASDAQ_garch, main="NASDAQ residuals with std t-dist")
plot(residuals_nikkei_garch, main="NIKKEI residuals with std t-dist")

Code

Box.test(residuals_sp500_garch, lag = 20, type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  residuals_sp500_garch
## X-squared = 29.20505, df = 20, p-value = 0.0837997
Box.test(residuals_cac40_garch, lag = 20, type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  residuals_cac40_garch
## X-squared = 22.33086, df = 20, p-value = 0.32283
Box.test(residuals_NASDAQ_garch, lag = 20, type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  residuals_NASDAQ_garch
## X-squared = 27.29709, df = 20, p-value = 0.127111
Box.test(residuals_nikkei_garch, lag = 20, type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  residuals_nikkei_garch
## X-squared = 23.63077, df = 20, p-value = 0.258893

Box.test(residuals_sp500_garch^2, lag = 20, type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  residuals_sp500_garch^2
## X-squared = 16.12304, df = 20, p-value = 0.708962
Box.test(residuals_cac40_garch^2, lag = 20, type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  residuals_cac40_garch^2
## X-squared = 14.8594, df = 20, p-value = 0.784396
Box.test(residuals_NASDAQ_garch^2, lag = 20, type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  residuals_NASDAQ_garch^2
## X-squared = 23.44418, df = 20, p-value = 0.267514
Box.test(residuals_nikkei_garch^2, lag = 20, type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  residuals_nikkei_garch^2
## X-squared = 7.058748, df = 20, p-value = 0.996488

As we can see from the results of the Ljung-Box test on both the residuals and the squared residuals, we cannot reject the null hypothesis of autocorrelation being present in the residuals of the GARCH(1,1) model. As a result, we have a good quality fit wit a GARCH(1,1), though a bit weaker in the case of SP500.

1.2.8 (h) Use the garchAuto.R script in order to fit a GARCH on the residuals of the ARMA(p,q) from (g). Assess the quality of the  t.

Code
# Auto GARCH formula
garchAutoTryFit = function(
    ll,
    data,
    trace=FALSE,
    forecast.length=1,
    with.forecast=TRUE,
    ic="AIC",
    garch.model="garch" )
{
  formula = as.formula( paste( sep="",
                               "~ arma(", ll$order[1], ",", ll$order[2], ")+",
                               garch.model,
                               "(", ll$order[3], ",", ll$order[4], ")" ) )
  fit = tryCatch( garchFit( formula=formula,
                            data=data,
                            trace=FALSE,
                            cond.dist=ll$dist ),
                  error=function( err ) TRUE,
                  warning=function( warn ) FALSE )
  
  pp = NULL
  
  if( !is.logical( fit ) ) {
    if( with.forecast ) {
      pp = tryCatch( predict( fit,
                              n.ahead=forecast.length,
                              doplot=FALSE ),
                     error=function( err ) FALSE,
                     warning=function( warn ) FALSE )
      if( is.logical( pp ) ) {
        fit = NULL
      }
    }
  } else {
    fit = NULL
  }
  
  if( trace ) {
    if( is.null( fit ) ) {
      cat( paste( sep="",
                  "   Analyzing (", ll$order[1], ",", ll$order[2],
                  ",", ll$order[3], ",", ll$order[4], ") with ",
                  ll$dist, " distribution done.",
                  "Bad model.\n" ) )
    } else {
      if( with.forecast ) {
        cat( paste( sep="",
                    "   Analyzing (", ll$order[1], ",", ll$order[2], ",",
                    ll$order[3], ",", ll$order[4], ") with ",
                    ll$dist, " distribution done.",
                    "Good model. ", ic, " = ", round(fit@fit$ics[[ic]],6),
                    ", forecast: ",
                    paste( collapse=",", round(pp[,1],4) ), "\n" ) )
      } else {
        cat( paste( sep="",
                    "   Analyzing (", ll[1], ",", ll[2], ",", ll[3], ",", ll[4], ") with ",
                    ll$dist, " distribution done.",
                    "Good model. ", ic, " = ", round(fit@fit$ics[[ic]],6), "\n" ) )
      }
    }
  }
  
  return( fit )
}

garchAuto = function(
    xx,
    min.order=c(0,0,1,1),
    max.order=c(5,5,1,1),
    trace=FALSE,
    cond.dists="sged",
    with.forecast=TRUE,
    forecast.length=1,
    arma.sum=c(0,1e9),
    cores=1,
    ic="AIC",
    garch.model="garch" )
{
  require( fGarch )
  require( parallel )
  
  len = NROW( xx )
  
  models = list( )
  
  for( dist in cond.dists )
    for( p in min.order[1]:max.order[1] )
      for( q in min.order[2]:max.order[2] )
        for( r in min.order[3]:max.order[3] )
          for( s in min.order[4]:max.order[4] )
          {
            pq.sum = p + q
            if( pq.sum <= arma.sum[2] && pq.sum >= arma.sum[1] )
            {
              models[[length( models ) + 1]] = list( order=c( p, q, r, s ), dist=dist )
            }
          }
  
  res = mclapply( models,
                  garchAutoTryFit,
                  data=xx,
                  trace=trace,
                  ic=ic,
                  garch.model=garch.model,
                  forecast.length=forecast.length,
                  with.forecast=TRUE,
                  mc.cores=cores )
  
  best.fit = NULL
  
  best.ic = 1e9
  for( rr in res )
  {
    if( !is.null( rr ) )
    {
      current.ic = rr@fit$ics[[ic]]
      if( current.ic < best.ic )
      {
        best.ic = current.ic
        best.fit = rr
      }
    }
  }
  
  if( best.ic < 1e9 )
  {
    return( best.fit )
  }
  
  return( NULL )
}
Code
#Fit the AutoGarch on the ARMA residuals
autogarch_sp500_std <- garchAuto(arma_sp500_residuals)
## Loading required package: parallel
autogarch_cac40_std <- garchAuto(arma_cac40_residuals)
autogarch_NASDAQ_std <- garchAuto(arma_NASDAQ_residuals)
autogarch_nikkei_std <- garchAuto(arma_nikkei_residuals)

#Get the standardized residuals of the AutoGarch
residuals_sp500_autogarch <- residuals(autogarch_sp500_std, standardize = TRUE)
residuals_cac40_autogarch <- residuals(autogarch_cac40_std, standardize = TRUE)
residuals_NASDAQ_autogarch <- residuals(autogarch_NASDAQ_std, standardize = TRUE)
residuals_nikkei_autogarch <- residuals(autogarch_nikkei_std, standardize = TRUE)

par(mfrow = c(2, 2))
plot(residuals_sp500_autogarch, main="SP500 residuals with std t-dist")
plot(residuals_cac40_autogarch, main="CAC40 residuals with std t-dist")
plot(residuals_NASDAQ_autogarch, main="NASDAQ residuals with std t-dist")
plot(residuals_nikkei_autogarch, main="NIKKEI residuals with std t-dist")

Code

#Assess the quality of the fit
Box.test(residuals_sp500_autogarch, lag = 20, type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  residuals_sp500_autogarch
## X-squared = 32.82426, df = 20, p-value = 0.0352692
Box.test(residuals_cac40_autogarch, lag = 20, type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  residuals_cac40_autogarch
## X-squared = 19.67275, df = 20, p-value = 0.478561
Box.test(residuals_NASDAQ_autogarch, lag = 20, type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  residuals_NASDAQ_autogarch
## X-squared = 27.86324, df = 20, p-value = 0.112676
Box.test(residuals_nikkei_autogarch, lag = 20, type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  residuals_nikkei_autogarch
## X-squared = 17.4285, df = 20, p-value = 0.624999

Box.test(residuals_sp500_autogarch^2, lag = 20, type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  residuals_sp500_autogarch^2
## X-squared = 15.63696, df = 20, p-value = 0.738876
Box.test(residuals_cac40_autogarch^2, lag = 20, type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  residuals_cac40_autogarch^2
## X-squared = 16.42077, df = 20, p-value = 0.690201
Box.test(residuals_NASDAQ_autogarch^2, lag = 20, type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  residuals_NASDAQ_autogarch^2
## X-squared = 22.03125, df = 20, p-value = 0.338818
Box.test(residuals_nikkei_autogarch^2, lag = 20, type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  residuals_nikkei_autogarch^2
## X-squared = 7.595681, df = 20, p-value = 0.994223

The results of the Ljung Box tests on the autoGarch function residuals give us non-significant p-values as well, except for the SP500. This means that we cannot reject the null hypothesis of autocorrelation between the residuals for CAC40, NASDAQ and Nikkei, but we do reject it at a 95% confidence level for SP500. However, the squared residuals for SP500 show a high enough p-value for it to not be rejected.

We can conclude that SP500 could still have autocorrelation in its residuals, but it depends on which Ljung-Box test we decide to trust more (normal residuals or squared residuals).

2 Practical 2

2.1 Part 1: Venice

2.1.1 (a) Read in the data. Extract and represent the yearly max values from 1940 to 2009. What do you observe ?

Code
# Read the Data
venice <- venice90
# Using block maxima approach to have the maximum sea level per year 
venice_max <- venice %>%
  group_by(year) %>%
  summarise(max_sea_level = max(sealevel))
head(venice_max)
## # A tibble: 6 x 2
##    year max_sea_level
##   <int>         <dbl>
## 1  1940           101
## 2  1941            99
## 3  1942            93
## 4  1943            96
## 5  1944           106
## 6  1945           104
plot_ly(venice_max, x = ~year, y = ~max_sea_level, type = 'scatter', mode = 'markers', name = 'Max Value') %>% layout(title = "Maximum Value per year", xaxis = list(title="Year"), yaxis = list(title="Maximum Value"))  %>% add_segments(x = 1940, xend = 2009, y = 140, yend = 140, line = list(color = 'red', width = 2))

From the plot, we can discern that there are a total of 11 data points that surpass the 140 cm threshold, marked by the red line. The highest recorded sea level, occurring in 1966, is notably distinct as the peak value in the dataset, reaching 192 cm. When considering the distribution of the maximum values each year, there’s a hint of a potential trend in the data. Whether extreme values are stationary or not will be investigated in (D) using likelihood ratio test.

2.1.2 (b) We are end of 2009 and would like to predict the yearly maximum values over the next 13 years (from 2010 to 2022). A naive approach consists of fitting a linear model on the observed yearly maxima and predict their values for 2010-2022. Proceed to this prediction and provide confidence intervals.

Code
# Create linear model 
mod1 <- lm(max_sea_level ~ year, data = venice_max)
summary(mod1)
## 
## Call:
## lm(formula = max_sea_level ~ year, data = venice_max)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -26.49259 -11.48509  -2.37432   9.97635  71.86521 
## 
## Coefficients:
##                Estimate  Std. Error  t value Pr(>|t|)   
## (Intercept) -430.228256  200.361708 -2.14726 0.035341 * 
## year           0.279941    0.101469  2.75887 0.007445 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 17.1533 on 68 degrees of freedom
## Multiple R-squared:  0.100664,   Adjusted R-squared:  0.0874386 
## F-statistic: 7.61135 on 1 and 68 DF,  p-value: 0.00744505

# Predictions of 13 next years using the linear model 
mod1_predict <- predict.lm(mod1,newdata=data.frame("year"=c(2010:2022)),se=T, interval = "confidence", level = 0.95)
mod1_predict
## $fit
##           fit        lwr        upr
## 1  132.452174 124.181461 140.722886
## 2  132.732114 124.284836 141.179393
## 3  133.012055 124.387071 141.637039
## 4  133.291995 124.488236 142.095755
## 5  133.571936 124.588395 142.555476
## 6  133.851876 124.687608 143.016145
## 7  134.131817 124.785929 143.477705
## 8  134.411758 124.883408 143.940107
## 9  134.691698 124.980095 144.403301
## 10 134.971639 125.076031 144.867246
## 11 135.251579 125.171260 145.331898
## 12 135.531520 125.265818 145.797221
## 13 135.811460 125.359742 146.263178
## 
## $se.fit
##          1          2          3          4          5          6 
## 4.14474633 4.23322998 4.32228437 4.41187491 4.50196961 4.59253880 
##          7          8          9         10         11         12 
## 4.68355494 4.77499248 4.86682767 4.95903841 5.05160415 5.14450571 
##         13 
## 5.23772523 
## 
## $df
## [1] 68
## 
## $residual.scale
## [1] 17.1532717

As anticipated, the simple approach leads to poor predictions, with confidence intervals spanning from 124 to 146. The main reason for this lackluster performance is the violation of key assumptions in linear modeling when dealing with extreme values, such as: linearity, homoescedasticity and increased sensitivity to outliers. Consequently, we will turn to specialized models rooted in the realm of extreme value theory to address these challenges more effectively.

2.1.3 (c) Represent in the same graph the predicted yearly max values for the period 2010-2022, their pointwise confidence bounds and the observed values greater than 140 cm from the table below.

Code

# Stored the predictions in a dataframe 
venice_max_predict <- data.frame(
  PredictedValues = mod1_predict) %>% 
  mutate(year = c(2010:2022))
head(venice_max_predict)
##   PredictedValues.fit.fit PredictedValues.fit.lwr
## 1              132.452174              124.181461
## 2              132.732114              124.284836
## 3              133.012055              124.387071
## 4              133.291995              124.488236
## 5              133.571936              124.588395
## 6              133.851876              124.687608
##   PredictedValues.fit.upr PredictedValues.se.fit PredictedValues.df
## 1              140.722886             4.14474633                 68
## 2              141.179393             4.23322998                 68
## 3              141.637039             4.32228437                 68
## 4              142.095755             4.41187491                 68
## 5              142.555476             4.50196961                 68
## 6              143.016145             4.59253880                 68
##   PredictedValues.residual.scale year
## 1                     17.1532717 2010
## 2                     17.1532717 2011
## 3                     17.1532717 2012
## 4                     17.1532717 2013
## 5                     17.1532717 2014
## 6                     17.1532717 2015

#Plot the confidence intervals
plotCI(x = venice_max_predict$year,
       y = venice_max_predict$PredictedValues.fit.fit,
       li = venice_max_predict$PredictedValues.fit.lwr,
       ui = venice_max_predict$PredictedValues.fit.upr)

Code

#Create a new dataframe for the extreme values of 2010 - 2022 (table from Wikipedia)
max_real <- data.frame(year = c(2012, 2012, 2013, 2018, 2019, 2019, 2019, 2022), max_sea_level = c(143, 149, 143, 156, 187, 144, 154, 204))

# Create the ggplot object
venice_plot <- ggplot() +
  geom_point(data = max_real, aes(x = year, y = max_sea_level, color = "Observed"), alpha = 0.5, show.legend = TRUE, name = "Observed") +
  geom_point(data = venice_max_predict, aes(x = year, y = PredictedValues.fit.fit, color = "Predicted"), shape = 1, show.legend = TRUE, name = "Predicted") +
  labs(title = "Predicted Yearly Max Values vs Observed Values (>140cm)", x = "Year", y = "Sea Level") +
  scale_x_continuous(breaks = unique(c(venice_max_predict$year, max_real$year))) +
  scale_color_manual(name = "Data Type", values = c("Observed" = "red", "Predicted" = "black")) +
  theme(legend.title = element_blank())

# Convert the ggplot object to a Plotly interactive plot
interactive_venice_plot <- ggplotly(venice_plot)

# Add the confidence interval as a separate trace
interactive_venice_plot <- interactive_venice_plot %>%
  add_ribbons(data = venice_max_predict, x = ~year, ymin = ~PredictedValues.fit.lwr, ymax = ~PredictedValues.fit.upr, color = I("blue"), showlegend = TRUE, name = "Confidence Interval")

# Display the interactive plot
interactive_venice_plot

As previously noted and showcased in the plot, the predictions of extreme values using a linear model exhibit significant shortcomings. The predicted values follow a linear pattern and consistently fall short of accurately capturing extreme events. There is one exception, the observation on the 13th of November 2019, for which the Confidence Interval of the predicted value aligns well with the actual value. Nonetheless, even in this case, the model underestimates the observed value by approximately 9 cm, a substantial deviation.

To address these limitations, we are turning to the Generalized Extreme Value (GEV) distribution, a more suitable approach for modeling extreme events.

2.1.4 (d) Fit a GEV a with constant parameters to the historical yearly max values. Fit a GEV with time varying location parameter. Compare the two embedded models using likelihood ratio test (LRT). Show diagnostic plots.

Code
########################################## extRemes ##########################################
# First we unlist our data frame
list_max_sea_levels <- unlist(venice_max$max_sea_level)
# GEV model with fixed location 
gev_fix <- fevd(list_max_sea_levels, type = "GEV", time.units= "year")
plot(gev_fix) ; gev_fix$results$par ; return.level(gev_fix) # shape is almost equal to 0.

##       location          scale          shape 
## 114.5629627966  14.5669041620  -0.0328669377
## fevd(x = list_max_sea_levels, type = "GEV", time.units = "year")
## get(paste("return.level.fevd.", newcl, sep = ""))(x = x, return.period = return.period)
## 
##  GEV model fitted to  list_max_sea_levels  
## Data are assumed to be  stationary 
## [1] "Return Levels for period units in years"
##   2-year level  20-year level 100-year level 
##     119.869893     155.784722     176.753120
# Compute confidence interval 
ci(gev_fix, type= "parameter") # CI includes 0
## fevd(x = list_max_sea_levels, type = "GEV", time.units = "year")
## 
## [1] "Normal Approx."
## 
##           95% lower CI       Estimate  95% upper CI
## location 110.724115809 114.5629627966 118.401809784
## scale     11.785048780  14.5669041620  17.348759544
## shape     -0.200251569  -0.0328669377   0.134517694
# GEV model with varying time-location using both linear and harmonic function 
yy <- 1:length(venice_max$year) 
gev_time_varying_linear <- fevd(venice_max$max_sea_level, venice_max, type = "GEV", time.units= "year", location.fun = ~ yy)
gev_time_varying_harmonic <- fevd(venice_max$max_sea_level, venice_max, type = "GEV", time.units= "year", location.fun = ~sin(2 * pi * (year - 1940)/70) + cos(2 * pi * (year - 1940)/70))

plot(gev_time_varying_linear) ; plot(gev_time_varying_harmonic) ;#gev_time_varying_linear$results$par ; gev_time_varying_harmonic$results$par;  return.level(gev_time_varying_linear) ; return.level(gev_time_varying_harmonic) 

Code
# Compare the two models using likelihood ratio test: Ho: no significant difference in model fit H1: there is a significant difference in model fit between the two models.
lrt_result_linear <- lr.test(gev_fix, gev_time_varying_linear) # we can almost reject the null hypothesis at 95% significant level, which make sense as we have indications that the distribution is non-stationary
lrt_result_harmonic <-lr.test(gev_fix, gev_time_varying_harmonic) 
attributes(gev_time_varying_linear)
## $names
##  [1] "call"            "data.name"       "weights"        
##  [4] "in.data"         "missing.values"  "x"              
##  [7] "cov.data"        "method"          "type"           
## [10] "period.basis"    "par.models"      "const.loc"      
## [13] "const.scale"     "const.shape"     "n"              
## [16] "na.action"       "parnames"        "results"        
## [19] "initial.results"
## 
## $class
## [1] "fevd"
# 589.5372 linear --> this is sufficient as lower AIC 
# 594.9147 harmonic --> too complex, we stick with the linear (time varying location)

We initially observed that the shape parameter of the fixed Generalized Extreme Value (GEV) model was close to 0. This suggested that the extreme values closely follow a Gumbel distribution, which is characterized by an exponential tail. To confirm this hypothesis, we calculated a confidence interval using the ci function in the extRemes package. The resulting confidence interval included 0, indicating that the shape parameter was not significantly different from 0.

Next, we aimed to fit a GEV model with time-varying location parameters. We explored two different functions: linear and harmonic. The linear function assumed that “Year” is treated as a linear predictor, and the location parameter is assumed to change linearly over time. However, the likelihood ratio test indicated that the fixed GEV model (stationary) was a better fit. This result made sense because while there were some patterns in the maximum values, the linear function did not capture the entire variability of the data.

On the other hand, the harmonic function, which is often used to model periodic phenomena, yielded a different outcome. It allowed us to reject the null hypothesis of the likelihood ratio test, indicating that the GEV model with time-varying location parameters was a better fit. This result suggested that our maximum values displayed non-stationary behavior.

2.1.5 (e) Add if necessary a time varying scale and or shape GEV parameter. Select the best model according to LRT.

Code
# Fit a GEV model with time-varying scale 
gev_time_varying_scale <-  fevd(venice_max$max_sea_level, venice_max, type = "GEV", time.units = "year", scale.fun = ~ yy) # AIC: 600
gev_time_varying_shape <- fevd(venice_max$max_sea_level, venice_max, type = "GEV", time.units = "year", shape.fun = ~ yy) # AIC:  600.7072 
gev_time_varying_scale_shape <- fevd(venice_max$max_sea_level, venice_max, type = "GEV", time.units = "year", shape.fun = ~ yy, scale.fun = ~ yy ) # AIC: 601.9
gev_time_varying_location_scale <- fevd(venice_max$max_sea_level, venice_max, type = "GEV", time.units = "year", scale.fun = ~ yy, location.fun =  ~ yy) # AIC: 590.6
gev_time_varying_location_shape <- fevd(venice_max$max_sea_level, venice_max, type = "GEV", time.units = "year", shape.fun = ~ yy, location.fun =  ~ yy) # AIC: 586
gev_time_varying_scale_shape_location <- fevd(venice_max$max_sea_level, venice_max, type = "GEV", time.units = "year", shape.fun = ~ yy, location.fun =  ~ yy, scale.fun = ~ yy ) # AIC: 586 
lrt_result <- lr.test(gev_time_varying_location_shape, gev_time_varying_scale_shape_location) # can't reject the null hypothesis so we decide to choose the gev_time_varying_location_shape 

After comparing the AIC of the models above, it aprears that the best models are the “gev_time_varying_location_shape” and “gev_time_varying_scale_shape_location”, for which we did the LRT test. it shows us that the best model is the less complex one: the “gev_time_varying_location_shape”.

2.1.6 (f) Predict the 13-years return level, each year from 2010 to 2022.

The obtained return levels are quite stable, slowly increasing over time from 148 to 149.

2.1.7 (g) Calculate confidence bands for these predictions

Here, we can observe a trend slowly increasing from the predicted values. However, there is a noticeable problem with the method we used. Indeed, the confidence intervals are way too close to the return level, giving us bounds which are too narrow to observe anything significant. A more complete analysis would require a de-trending of the bootstrapped values, but we were limited by the time allowed to complete this work.

2.1.8 (h) Represent in the same graph your predictions of the 13-years return levels, their pointwise confidence intervals, the predicted yearly max values from the linear model and the observed values greater than 140 cm from the table below.

Code
rlci$Year = c(2010:2022)
return_levels_df$Year = c(2010:2022)

interactive_venice_plot %>% 
  add_lines(data = rlci, x = ~Year, y = ~return_level, color = I("red"), name = "Predicted values (model)") %>%
  add_ribbons(data = rlci, x = ~Year, ymin = ~Lower_CI, ymax = ~Upper_CI, color = I("red"), name  = "Confidence Interval (model)")

2.1.9 (i) Broadly speaking, each year, there is a chance of 1/13 that the observed value is above the 13- years return level. Comment the results for both the linear model prediction and GEV approach. Note that 12 of the 20 events occurred in the 21st century.

The main issue we see is that the observed values from 2010 to 2022 are much higher than the expected return level. This indicates that the models we are using are crucially underestimating how fast the sea level is currently rising, most notably the extreme values.

The linear model predictions did not give us something reliable enough, as it did not take into account its parameters varying. As a result, only one of the observed values is inside of the confidence intervals.

the return levels from the GEV model are giving us better results overall, but still far from the reality. If we imagine similarly wide confidence intervals for the GEV (ignoring the limitations we had) compared to the linear model, we could potentially have up to 6/8 values inside its bounds, which would be somewhat good, except for the two highest values (12/11/2019 and 22/11/2022), which are on a scope that wasn’t even imagined by the model.

In conclusion, the models we used until now were all incomplete, and could not predict the massive change the sea levels underwent, most notably in the latest 3 years where the values attained heights never seen before. This event can be explained through the effects of global warming, accelerating the pace at which the sea level rises worldwide.

2.2 Part 2: Nuclear reactors

2.2.1 (a) Read in the data. Display a time series plot of the water level across the data range and try to identify times of highest levels.

Code
# Load the data
load(here::here("data/practical_2/niveau.Rdata"))
#niveau$Wert <- format(niveau$Wert, nsmall = 2)
# Convert to Date format
niveau$Zeitstempel <- as.Date(niveau$Zeitstempel)
niveau$Zeitpunkt_des_Auftretens <- as.Date(niveau$Zeitpunkt_des_Auftretens)
# Identifying points above the threshold
quantile(niveau$Wert, 0.95)
##      95% 
## 326.9117
threshold <- 326.9117 
above_threshold <- niveau[niveau$Wert > threshold, ]

# Create a base plot
p <- plot_ly(data = niveau, x = ~Zeitstempel, y = ~Wert, type = 'scatter', mode = 'lines',
             name = 'Water Level', hoverinfo = 'x+y')

# Add points above the threshold
p <- add_trace(p, data = above_threshold, x = ~Zeitstempel, y = ~Wert, mode = 'markers',
               marker = list(color = 'red', size = 10), name = 'Above Threshold')

# Customize layout
p <- layout(p, title = 'Time Series Plot of Water Levels',
            xaxis = list(title = 'Date'),
            yaxis = list(title = 'Water Level (m)'))

# Show the plot
p

To pinpoint the periods with the highest water levels, we’ve established a threshold of 326.91 meters (95% quantile). Upon analysis, the peak water level was observed in August 2007, surpassing this predefined threshold, with a value of 329.32 meters. Furthermore, in July 2021, there were many instances where the water levels exceeded the established threshold of 326.91 meters.

2.2.2 (b) Now display a histogram of the water levels. What do you observe about the distribution?

Code
# Plot histogram with custom axis
hist(niveau$Wert, breaks = 30, col = "skyblue", xlab = "Water Level (m)", 
     main = "Histogram of Water Levels", xaxt = 'n')  # Turn off default x-axis

# Adding a line for mean
mean_value <- mean(niveau$Wert)
abline(v = mean_value, col = "red", lwd = 2)

# Adding custom x-axis with decimals
axis(1, at = seq(floor(min(niveau$Wert)), ceiling(max(niveau$Wert)), by = 0.5), las = 2)

# Adding a legend
legend("topright", legend = paste("Mean =", round(mean_value, 2)), 
       col = "red", lwd = 2)

The histogram reveals a right-skewed distribution, indicating a higher frequency of lower water level readings and a gradual decrease in frequency as the levels rise. Most of the observed water levels concentrate below 327 meters. Notably, the distribution’s tail extends to the right, suggesting infrequent occurrences of substantially elevated water levels. These exceptional values could represent sporadic episodes of extreme water conditions.

2.2.3 (c) Explain how you would model the high water levels using a peaks-over-threshold approach.

The peaks-over-threshold (POT) approach is a statistical method used to model extreme values in a dataset. It focuses on the tail of the distribution, where the extreme events reside. To model high water levels using the POT approach, I would follow these steps:

  1. Select a threshold: choose an appropriate threshold above which water level readings are considered extreme and above which the plot is roughly linear. This threshold should be high enough to exclude mundane variations but not so high that you have too few data points to model accurately.

  2. Extracting exceedances: once the threshold is set, identify all the data points that exceed it. These are the “peaks” you will analyze. Use the mean excess plot.

  3. Fitting a Distribution: fit a Generalized Pareto Distribution (GPD) to the exceedances. The fitting can be done using maximum likelihood estimation or Bayesian methods.

  4. Parameter Estimation: which are the shape parameter (kappa), the scale parameter (sigma), and the location parameter (theta). These parameters define the behavior of the distribution’s tail.

  5. Model Diagnostics: after fitting the model, perform diagnostic checks to ensure that the GPD provides a good fit to the data: analyzing residual plots, conducting goodness-of-fit tests, and using plots comparing the empirical and theoretical exceedance probabilities.

  6. Clustering: apply the Ferro-Segers estimate, using the extremalindex function in R, counts exceedances of a high threshold within time windows. A high extremal index (\>1) suggests clustering of exceedances, indicating a need for declustering.

  7. Estimating Return Levels: estimate high quantiles (return levels) and their corresponding return periods. The return level is the value that is expected to be exceeded once every specific period (e.g., a 100-year flood level).

  8. Assessing Uncertainty: confidence intervals for the return levels can be constructed, use confint.

2.2.4 (d) Comment on the aspect of clustering of extremes. How do you propose to measure and deal with clustering of the daily water levels?

Clustering of extremes refers to the tendency of extreme events to occur in sequences or bursts rather than being uniformly distributed over time. This is particularly relevant in environmental data, such as water levels, where extreme events can be dependent on preceding conditions (ex: Chaux-de-fonds heavy rainfall during 3 days).

To measure and deal with clustering in daily water levels, we propose the following steps:

Measuring clustering:

  1. Run Tests for Serial Dependence: use statistical tests like the autocorrelation function (ACF), or the Ljung-Box test to detect serial dependence in the data above the threshold.
  2. Identify Clusters: use the extremalindex function to determine whether. The “declustering” period—how much time must pass without an exceedance before a new cluster is defined—needs to be determined.
  3. Declustering Methods: apply declustering algorithms to separate the data into independent clusters of exceedances and intervening non-exceedances. A straightforward declustering method, is to use a for loopto identify and group extreme precipitation events by their occurrence in distinct years,and then use the decluster function from extRemes package.

Dealing with clustering:

  1. Declustering the Data: before fitting a model to the exceedances, decluster the data to remove the influence of serial dependence. Each cluster may be represented by its maximum value.
  2. Modeling with Clustering
  3. Intensity Measures: such as the mean number of exceedances per cluster or the average duration of clusters into the risk assessment.
  4. Time-Varying Thresholds: consider dynamic thresholds that adjust for seasonality or other known factors that influence clustering, rather than a fixed threshold.

2.2.5 (e) Perform the analysis you suggest in c) and d) and compute the 50- and 100-year return levels. Explain your choice of threshold and provide an estimate of uncertainty for the return levels. Note: take care to compute the return level in yearly terms.

Code
# 1. Select Threshold: using mean residual life plot, quantile and threshrange.plot
plot(niveau$Wert)

Code
meplot(niveau$Wert, at = pretty(niveau$Wert, n = 10), labels = format(pretty(niveau$Wert, n = 10), nsmall = 1))

Code
mrlplot(niveau$Wert,main="Mean Residual Life Plot")

Code
quantile(niveau$Wert, 0.95) # 326.9117 
##      95% 
## 326.9117
threshrange.plot(niveau$Wert, r = c(325, 329), nint = 16) # Choose 327
## Error in get(paste("ci.", newcl, sep = ""))(x = x, alpha = alpha, type = type,  : 
##   ci: Sorry, unable to calculate the parameter covariance matrix.  Maybe try a different method.
## Error in if (ipars1["scale"] <= 0) ipars1["scale"] <- 0.00000001 : 
##   missing value where TRUE/FALSE needed

Code
threshold_clustering <- 326.9117  # select 327 as the threshold
# 2. Extract Exceedances
exceedances <- niveau$Wert[niveau$Wert > threshold_clustering]
# 3. Fit distribution over a range of thresholds and parameter estimation: GPD
fitMLE <- fevd(as.vector(exceedances), method = "MLE",  type="GP", threshold = threshold_clustering)
# 4. Model Diagnostics
plot(fitMLE)

Code
# 5. Check if there is clustering of extremes (Ferro-Segers estimate)
extremalindex(niveau$Wert, threshold_clustering) # 1 / 0.149 = 6.7; we have a very large value (which is very logical as we are dealing with precipitation data), indicating a need to decluster
## 
##  Intervals Method Estimator for the Extremal Index
## NULL
## 
##  theta.tilde used because there exist inter-exceedance times > 2.
##     extremal.index number.of.clusters         run.length 
##        0.149419682       60.000000000       13.000000000
# 6. Declustering 
years <- c()
k <- 1
for(i in 1:nrow(niveau)){
  if(is.na(niveau$Wert[i])){
    next
  }else{
    years[k] <- year(niveau$Zeitstempel[i])
    k <- k+1
  }
}
years <- years-1999
decl1 <- decluster(as.vector(niveau$Wert), threshold = threshold_clustering, groups=years, na.action=na.omit)
plot(decl1)

Code
# Add years and decl1 to niveau data frame 
niveau$years <- years
niveau$decl1 <- decl1
# 7. Estimate Return Levels with declustered data, using cmax = T
# Assuming 1 observation per day, we prefer using fpot from evd
rl_mle_evd_50 <- fpot(as.vector(niveau$Wert), cmax = T, threshold = threshold_clustering, npp = 365.25, mper=50)
rl_mle_evd_100 <- fpot(as.vector(niveau$Wert), cmax= T, threshold = threshold_clustering, npp = 365.25, mper=100)
rl_mle <- return.level(fitMLE, conf = 0.05, return.period= c(2,5,10,20,50,100), do.ci=T)
# Plotting the 50-year and 100-year return level
par(mfrow = c(2, 2))
plot(rl_mle_evd_50)
title(main = "50-Year Return Level", line = -1, col.main = "red")

Code
plot(rl_mle_evd_100)
title(main = "100-Year Return Level", line = -1, col.main = "red")

Code
par(mfrow = c(1, 1))

This threshold was determined by examining the mean residual life plot, where 326.9 meters emerged as a critical point where the mean excess values begin to show significant variation. This value represents a transition in the data, marking where the mean excess values start to widen noticeably. This observation is further substantiated by the fact that 326.9 meters corresponds to the 95th percentile of your dataset, indicating it is a level exceeded by only the top 5% of your observations.

2.2.6 (f) Explain the drawbacks and advantages of using a block maxima method instead of the one used in c)-e).

2.3 Part 3: Night temperatures in Lausanne

2.3.1 (a) Read in the data for the daily night maximum temperatures in Lausanne. Subset the summer months (June to September).

Code
night_max <- read_csv(here::here("data/practical_2/nightmax.csv"))  
## New names:
## Rows: 8263 Columns: 4
## -- Column specification
## --------------------------------------------- Delimiter: "," chr
## (1): station dbl (2): ...1, night.max date (1): date
## i Use `spec()` to retrieve the full column specification for this
## data. i Specify the column types or set `show_col_types = FALSE` to
## quiet this message.
## * `` -> `...1`
night_min <- read_csv(here::here("data/practical_2/nightmin.csv"))
## New names:
## Rows: 8271 Columns: 4
## -- Column specification
## --------------------------------------------- Delimiter: "," chr
## (1): station dbl (2): ...1, night.min date (1): date
## i Use `spec()` to retrieve the full column specification for this
## data. i Specify the column types or set `show_col_types = FALSE` to
## quiet this message.
## * `` -> `...1`
# Remove rows with missing values
night_max <- na.omit(night_max)
night_min <- na.omit(night_min)
# Subset the data for summer months (June to September)
summer_night_max <- night_max[format(night_max$date,"%m") %in% c("06", "07", "08", "09"), ] 

2.3.2 (b) Assess whether extremes of the subsetted series in (a) occur in cluster.

Code
# Visualise the data: plot and histogram of the maximum temperatures for the summer months 
summer_night_max$date_str <- format(summer_night_max$date, "%d %b %Y") 
interactive_plot <- plot_ly(summer_night_max, x = ~date, y = ~night.max, type = 'scatter', mode = 'lines+markers',
                            hoverinfo = 'text', text = ~paste('Date:', date_str, '<br>Temp:', night.max)) %>%
  layout(title = "Summer Night Max Temperatures",
         xaxis = list(title = "Date"),
         yaxis = list(title = "Night Max Temperature"))
interactive_plot
Code
hist(summer_night_max$night.max, breaks = 30, col = "skyblue", xlab = "Night Max", ylab = "Frequency", main = "Histogram of summer night max temperatures")

Code
# Have more information about the distribution 
min(summer_night_max$night.max) ; mean(summer_night_max$night.max); max(summer_night_max$night.max)
## [1] 5.6
## [1] 21.9083848
## [1] 34.6
quantile(summer_night_max$night.max, 0.95) # doesn't seem to be very right-skewed 
## 95% 
##  30
# Choose the threshold using the mrlplot()
mrlplot(summer_night_max$night.max, main="Mean residual")

Code
threshrange.plot(summer_night_max$night.max, r= c(20,30), nint = 20) # we are going to use 30 

Code
th_1 <- 30 # with this threshold we would have 5% of observations: 400 so seems good. 
# Visualise the threshold 
plot(summer_night_max$night.max, type = 'l')
abline(h=th_1,col=2) # looks good 

Code
# Assess the threshold
pot_mle <- fevd(summer_night_max$night.max, method = "MLE", type="GP", threshold=th_1) 
plot(pot_mle) 

Code
rl_mle <- return.level(pot_mle, conf = 0.05, return.period= c(2,5,10,20,50,100), do.ci=T) # diagnostic plot looks good and confidence interval become wider the longer the return level, which make sense 
# Return level plots with MLE 
par(mfcol=c(1,1))
plot(pot_mle, type="rl",
     main="Return Level Plot for Oberwang w/ MLE",
    ylim=c(0,200), pch=16)
loc <- as.numeric(return.level(pot_mle, conf = 0.05, return.period=50))
segments(50, 0, 50, loc, col= 'midnightblue',lty=6)
segments(0.01,loc,50, loc, col='midnightblue', lty=6)

Code
# Assess wether extremes occur in a cluster using extremalindex()
extremalindex(night_max$night.max, th_1) # 0.31 so 1/0.31 = 3.23 WE NEED TO DECLUSTER 
## 
##  Intervals Method Estimator for the Extremal Index
## NULL
## 
##  theta.tilde used because there exist inter-exceedance times > 2.
##     extremal.index number.of.clusters         run.length 
##        0.309860203       42.000000000        7.000000000

We have an approximate mean cluster size of 3.23, indicating that extremes values are clustered together. Therefore, we need to decluster.

2.3.3 (c) Decluster the data from (a) using a suitable threshold. Plot the resulting declustered data. (Hint: you may want to use the extRemes package.)

Code
# We need to create a vector of size summer_night_max that stores the year. 
years_part3 <- c()
k <- 1
for (i in 1:nrow(summer_night_max)) {
  if (is.na(summer_night_max$night.max[i])) {
    next
  } else {
    years_part3[k] <- year(summer_night_max$date[i])
    k <- k + 1
  }
}
years_part3 <- years_part3-1999

# Use decluster function of extRemes 
decl1 <- decluster(summer_night_max$night.max, threshold=th_1, groups=years_part3, na.action=na.omit) # vector and groups need to have the same size 
decl1 # we have 71 clusters 
## 
##  summer_night_max$night.max  declustered via runs  declustering.
## 
##  Estimated extremal index (intervals estimate) =  0.741502152 
## 
##  Number of clusters =  71 
## 
##  Run length =  1
plot(decl1) # shows in grey the points that are not retained in the selection 

2.3.4 (d) Fit a GPD to the data, both raw and declustered. Assess the quality of the fit.

Code
# Use fevd of the normal and declustered 
gpd_raw <- fevd(summer_night_max$night.max, threshold = th_1, type = "GP")
gpd_declustered <- fevd(decl1, threshold = th_1, type = "GP")
par(mfrow = c(2, 2))
plot(gpd_raw)
title(main = "GPD fitted to raw data", line = 1, col.main = "red")

Code
plot(gpd_declustered)
title(main = "GPD fitted to declustered data", line = 1, col.main = "red")

Code
par(mfrow = c(1, 1))
# Assess the fit: AIC for raw is 369, AIC for declustered data is 193

For the raw data, the adherence of the points to the 1:1 line in the Quantile-Quantile (Q-Q) plot suggests a superior fit compared to the declustered data. Furthermore, the lower AIC value for the raw data implies that it provides a more efficient balance between model complexity and goodness of fit. This indicates that the model for the raw data may be more appropriate for capturing the underlying distribution of the dataset.

2.3.5 (e) Repeat the above analysis for the negatives of the daily nightly minimum temperatures for the winter months (November-February).

As we are dealing with negative values (min), we need to invert the night.min values to apply the Peak-Over-Threshold. ::: callout-note ## Note To correctly subset winter months in night.min considering that winter spans from November and December of year t and January and February of year t+1, we need to adjust the selection process. ::: ::: {.cell}

Code
# Assuming 'night_min' has a 'date' column in Date format
night_min$year <- as.numeric(format(night_min$date, "%Y"))
night_min$month <- format(night_min$date, "%m")
night_min <- night_min %>%
  mutate(
    year = as.numeric(format(date, "%Y")),
    month = format(date, "%m"),
    season_year = ifelse(month %in% c("01", "02"), year - 1, year)
  )
# Subset for winter months
winter_night_min <- night_min %>%
  filter(
    (month %in% c("11", "12") & season_year == year) |
    (month %in% c("01", "02") & season_year == year - 1)
  )
# Add the month in season-year 
winter_night_min <- winter_night_min %>% mutate(day = day(date), full_date = paste(day, month, season_year, sep = "-"))
# Convert into date format
winter_night_min <- winter_night_min %>% mutate(full_date = as.Date(full_date, format = "%d-%m-%Y"))

::: ::: callout-note ## Note To properly implement the Peak-Over-Threshold method with negative values, the data frame should be inverted so that the negative values are positioned at the top. :::

Code
# Invert night.min as temperatures are negative 
winter_night_min$night.min <- -winter_night_min$night.min
# Visualise the data: plot and histogram
p <- ggplot(winter_night_min, aes(x = full_date , y = night.min)) +
  geom_line() +  # This adds the line type plot
  labs(x = "Date", y = "Night Min Temperature", title = "Winter Night Min Temperatures") +
  theme_minimal()  # Optional: adds a minimal theme
interactive_plot_winter <- ggplotly(p)
interactive_plot_winter
Code
hist(winter_night_min$night.min, breaks = 30, col = "skyblue", xlab = "Frequency", ylab = "Night Min Temperature", main = "Winter Night Min Temperatures frequency")

Code
# Have more information about the distribution 
min(winter_night_min$night.min); mean(winter_night_min$night.min); max(winter_night_min$night.min)
## [1] -13.6
## [1] -2.8185762
## [1] 13.4
quantile(winter_night_min$night.min, 0.95)
## 95% 
## 3.3
# Choose the threshold using the mrlplot()
mrlplot(winter_night_min$night.min, main="Mean residual")

Code
threshrange.plot(winter_night_min$night.min, r= c(0,6), nint =20)

Code
th_2 <- 3.3
# Visualise the threshold 
plot(winter_night_min$night.min, type = 'l')
abline(h=th_2,col=2) # looks good 

Code
# Assess the threshold
pot_mle_2 <- fevd(winter_night_min$night.min, method = "MLE", type="GP", threshold=th_2) 
plot(pot_mle_2) 

Code
rl_mle <- return.level(pot_mle_2, conf = 0.05, return.period= c(2,5,10,20,50,100), do.ci=T) # diagnostic plot looks good and confidence interval become wider the longer the return level, which make sense 
# Return level plots with MLE 
par(mfcol=c(1,1))
plot(pot_mle_2, type="rl",
     main="Return Level Plot for Oberwang w/ MLE",
    ylim=c(0,200), pch=16)
loc <- as.numeric(return.level(pot_mle_2, conf = 0.05, return.period=50))
segments(50, 0, 50, loc, col= 'midnightblue',lty=6)
segments(0.01,loc,50, loc, col='midnightblue', lty=6)

Code
# Assess wether extremes occur in a cluster using extremalindex()
extremalindex(winter_night_min$night.min, th_2) # 0.32 so 1/0.32 = 3.125
## 
##  Intervals Method Estimator for the Extremal Index
## NULL
## 
##  theta.tilde used because there exist inter-exceedance times > 2.
##     extremal.index number.of.clusters         run.length 
##         0.32095905        40.00000000         5.00000000
########################### Declustering ########################### 
# We need to create a vector of size night_min that stores the year. 
years_part3_2 <- c()
k <- 1
for (i in 1:nrow(winter_night_min)) {
  if (is.na(winter_night_min$night.min[i])) {
    next
  } else {
    years_part3_2[k] <- year(winter_night_min$date[i])
    k <- k + 1
  }
}
years_part3_2 <- years_part3_2-1999

# Use decluster function of extRemes 
decl2 <- decluster(winter_night_min$night.min, threshold=th_2, groups=years_part3_2, na.action=na.omit) # vector and groups need to have the same size 
decl2 # we have 71 clusters 
## 
##  winter_night_min$night.min  declustered via runs  declustering.
## 
##  Estimated extremal index (intervals estimate) =  0.832497359 
## 
##  Number of clusters =  53 
## 
##  Run length =  1
plot(decl2) # shows in grey the points that are not retained in the selection 

Code
# Fit GPD with clustered and declustered data 
# Use fevd of the normal and declustered 
gpd_raw_2 <- fevd(winter_night_min$night.min, threshold = th_2, type = "GP")
gpd_declustered_2 <- fevd(decl2, threshold = th_2, type = "GP")
par(mfrow = c(2, 2))
plot(gpd_raw_2)
title(main = "GPD fitted to raw data", line = 1, col.main = "red")

Code
plot(gpd_declustered_2)
title(main = "GPD fitted to declustered data", line = 1, col.main = "red")

Code
par(mfrow = c(1, 1))
# Assess the fit: AIC for raw is 38100, AIC for declustered data is 193

3 Practical 3

Code
source(here::here("script/setup.R"))

3.1 Part 1

3.1.1 (a) Read in the data and select only the winter temperature data. You can use the code lines of the file GetData.R to get the data and select only the winter values.

Code
##### Download daily NAO measurements
NAO.daily <- fread('ftp://ftp.cdc.noaa.gov/Public/gbates/teleconn/nao.reanalysis.t10trunc.1948-present.txt')
NAO.daily <- as.matrix(NAO.daily)
colnames(NAO.daily) <- c("year","month","day","NAO")

##### Download temperature data: be sure to work in the correct directory
months <- c(12,1,2) #keep only winter observations
temp_max_Zermatt           <- read_delim(here::here("data/practical_3/daily_maximum_Zermatt/order_107669_data.txt"), 
                                         ";", escape_double = FALSE, col_types = 
                                           cols(time = col_number()), 
                                         trim_ws = TRUE, skip = 1)

colnames(temp_max_Zermatt) <- c("station","time","temp")
temp_max_Zermatt           <- temp_max_Zermatt[-1,]
temp_max_Zermatt[,2]       <- as.Date(apply(temp_max_Zermatt[,2],1,as.character),
                                      "%Y%m%d")

temp_max_Montana           <- read_delim(here::here("data/practical_3/daily_maximum_Montana/order_107668_data.txt"), 
                                         ";", escape_double = FALSE, col_types = 
                                           cols(time = col_number()), 
                                         trim_ws = TRUE, skip = 1)

colnames(temp_max_Montana) <- c("station","time","temp")
temp_max_Montana           <- temp_max_Montana[-1,]
temp_max_Montana[,2]       <- as.Date(apply(temp_max_Montana[,2],1,as.character),
                                      "%Y%m%d")

###match the dates of the two time series
temp_max_Montana           <- temp_max_Montana[match(as.matrix(temp_max_Zermatt[,2]),
                                                     as.matrix(temp_max_Montana[,2])),]
temp_max_Montana           <- as.matrix(temp_max_Montana)
colnames(temp_max_Montana) <- c("station","time","temp")
temp_max_Zermatt           <- as.matrix(temp_max_Zermatt)
colnames(temp_max_Zermatt) <- c("station","time","temp")

###keep only winter dates
temp_max_Montana <- temp_max_Montana[which(month(as.POSIXlt(temp_max_Montana[,"time"], 
                                                            format="%Y-%m-%d")) %in% months),]
temp_max_Zermatt <- temp_max_Zermatt[which(month(as.POSIXlt(temp_max_Zermatt[,"time"], 
                                                            format="%Y-%m-%d")) %in% months),]


Date       <- function( length = 0 ){
  newDate = numeric( length )
  class(newDate) = "Date"
  return(newDate)
}

season_day                   <- yday(as.Date(temp_max_Montana[,2]))
season_day[season_day < 61]  <- season_day[season_day < 61] + 31
season_day[season_day > 334] <- season_day[season_day > 334]- 334

NAO.date <- Date(nrow(NAO.daily))
for(i in 1:nrow(NAO.daily)){
  NAO.date[i] <- as.Date(paste(as.character(NAO.daily[i,1]),"-",as.character(NAO.daily[i,2]),
                               "-",as.character(NAO.daily[i,3]),sep=""),format="%Y-%m-%d")
}
NAO.daily <- mutate(as.tibble(NAO.daily), date = make_date(year, month, day))

index <- as.Date(intersect(as.Date(temp_max_Montana[,2]),as.Date(NAO.date)))

nao <- NAO.daily[which(NAO.daily$date %in% index), 4]
nao <- as.vector(nao)

#Montana
x_Montana          <- data.frame("time"=temp_max_Montana[,2],
                                 "nao"=nao,
                                 "d"=season_day,
                                 "temp"=temp_max_Montana[,3])

as.numeric.factor  <- function(x) {as.numeric(levels(x))[x]}
x_Montana[,"temp"] <- as.numeric(x_Montana[,"temp"])
x_Montana[,"time"] <- as.numeric(year(as.POSIXlt(x_Montana[,"time"], format="%Y-%m-%d")))
x_Montana[,"time"] <- (x_Montana[,"time"]-min(x_Montana[,"time"]))/(max(x_Montana[,"time"])-
                                                                      min(x_Montana[,"time"]))

#Zermatt
x_Zermatt          <- data.frame("time"=temp_max_Zermatt[,2],
                                 "nao"=nao,
                                 "d"=season_day,
                                 "temp"=temp_max_Zermatt[,3])

as.numeric.factor  <- function(x) {as.numeric(levels(x))[x]}
x_Zermatt[,"temp"] <- as.numeric(x_Zermatt[,"temp"])
x_Zermatt[,"time"] <- as.numeric(year(as.POSIXlt(x_Zermatt[,"time"], format="%Y-%m-%d")))
x_Zermatt[,"time"] <- (x_Zermatt[,"time"]-min(x_Zermatt[,"time"]))/(max(x_Zermatt[,"time"])-
                                                                      min(x_Zermatt[,"time"]))

Z <- data.frame("Montana"=x_Montana[,"temp"] , "Zermatt"=x_Zermatt[,"temp"], "NAO"=x_Zermatt[,"NAO"])

3.1.2 (b) Represent the data and especially scatterplots of Zermatt values against Montana’s values. Same with Montana values against NAO values. Comment.

Code
# Scatterplot of Zermatt values against Montana values
plot(Z$Montana, Z$Zermatt, xlab = "Montana Temperature", ylab = "Zermatt Temperature", 
     main = "Zermatt vs Montana Temperatures")
abline(lm(Z$Zermatt ~ Z$Montana), col = "red") 
legend("topright", 
       legend = "Fitted Regression Line", 
       col = "red", 
       lty = 1, 
       cex = 0.8)

Code
# Scatterplot of Montana values against NAO values
plot(Z$Montana, Z$NAO, xlab = "Montana Temperature", ylab = "NAO Index", 
     main = "Montana Temperature vs NAO Index")
abline(lm(Z$NAO ~ Z$Montana), col = "red") # Add a linear regression line
legend("topright", 
       legend = "Fitted Regression", 
       col = "red", 
       lty = 1, 
       cex = 0.8)

  • Zermatt vs Montana Temperatures: scatterplot displays a discernible pattern, indicating a positive correlation between the temperatures in Zermatt and Montana. The red line, which delineates the linear regression model applied to the data, closely follows the trajectory of the data points. This suggests a robust positive linear association between the temperatures recorded in the two regions.

  • Montana Temperature vs NAO Index: scatterplot presents a dense concentration of data points that do exhibit a slight upward trend. The red line, which represents the fitted linear regression model, appears to be horizontal, suggesting little linear relationship between the temperature in Montana and the NAO Index. The slightly positive slope in the regression line implies that, based on this model, changes in the NAO Index correspond to predictable or significant changes in Montana’s temperature. This could indicate that the NAO Index may be a predictor of temperature in Montana.

3.1.3 (c) Test the correlation between these two pairs of series.

Code
cor_Zermatt_Montana <- cor(Z$Zermatt, Z$Montana)
cat("Correlation between Zermatt and Montana Temperatures:", cor_Zermatt_Montana, "\n")
## Correlation between Zermatt and Montana Temperatures: 0.95456767
cor.test(Z$Zermatt, Z$Montana)
## 
##  Pearson's product-moment correlation
## 
## data:  Z$Zermatt and Z$Montana
## t = 194.7707, df = 3697, p-value < 0.000000000000000222
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.951614981 0.957344110
## sample estimates:
##        cor 
## 0.95456767
Code
# Correlation between Montana Temperatures and NAO Index
cor_Montana_NAO <- cor(Z$Montana, Z$NAO)
cat("Correlation between Montana Temperatures and NAO Index:", cor_Montana_NAO, "\n")
## Correlation between Montana Temperatures and NAO Index: 0.254067152
# correlation test 
cor.test(Z$Montana, Z$NAO) # significant 
## 
##  Pearson's product-moment correlation
## 
## data:  Z$Montana and Z$NAO
## t = 15.97214, df = 3697, p-value < 0.000000000000000222
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.223670701 0.283969870
## sample estimates:
##         cor 
## 0.254067152
  • Correlation between Montana and Zermatt temperatures: correlation coefficient of 0.95 underscores an exceptionally strong positive relationship, consistent with the anticipation of a significant linkage between their climatic patterns. The statistical significance of this correlation has been affirmed by the cor.test, solidifying the inference that the temperatures in these two regions move in a closely synchronized manner.
  • Correlation between Montana and NAO: observed to be 0.25. This represents a relatively weak positive correlation, suggesting that while there is some degree of association between the two variables, it is not substantial. cor.test indicated that the correlation is also significant.

3.1.4 (d) Now concentrate on the extreme level and tail dependence. Represent χ and χ̄ for two pairs of series (Montana vs Zermatt and Montana vs NAO). You may want to use the function chiplot of the package evd. What do you observe?

Code
# Create a data frame for Zermatt vs Montana Temperatures
data_Zermatt_Montana <- data.frame(Zermatt = Z$Zermatt, Montana = Z$Montana)

# Chi Plot - Zermatt vs Montana Temperatures
chiplot(data_Zermatt_Montana, main1 = "Chi Plot - Zermatt vs Montana Temperatures")

Code
# Create a data frame for Montana Temperatures vs NAO Index
data_Montana_NAO <- data.frame(Montana = Z$Montana, NAO = Z$NAO)

# Chi Plot - Montana Temperatures vs NAO Index
chiplot(data_Montana_NAO, main1 = "Chi Plot - Montana Temperatures vs NAO Index")

  • Chi Bar Plot of Zermatt vs Montana temperatures: indicates the existence of tail dependence, revealing that there is a significant relationship between their extreme values. The plot demonstrates a consistent deviation from zero across all quantiles, with the measure of dependence (lambda) increasing towards 1. This pattern suggests that the extreme temperatures in Zermatt and Montana are not independent but rather exhibit a linked behavior, particularly in the tails of their respective distributions.

  • Chi Bar Plot of Montana vs NAO: suggests an absence of tail dependence, as evidenced by the lambda statistic equating to zero. The plot’s trajectory hovers around the zero mark, indicating that the two variables are likely to act independently at their extremes. Consequently, this implies that extreme values in Montana’s temperatures and the NAO indices do not occur in conjunction and thus do not exhibit simultaneous extreme behavior.

3.2 Part 2

3.2.1 (a) Read in the data. Then, plot the raw prices (Adj.Close) of both stocks. Discuss the stationarity assumption.

Code
engie <- read.csv(here::here("data/practical_3/engie.csv"))
veolia <- read.csv(here::here("data/practical_3/veolia.csv"))

# Data cleaning and plot
engie <- engie[-c(4456, 4489), ]
veolia <- veolia[-c(4456, 4489), ]

engie$Adj.Close <- as.numeric(engie$Adj.Close)
veolia$Adj.Close <- as.numeric(veolia$Adj.Close)

par(mfrow = c(1,2))
plot(engie$Adj.Close)
plot(veolia$Adj.Close)

Code

# ADF test to asses the stationarity
adf_engie <- adf.test(engie$Adj.Close) #p-value = 0.4466
adf_veolia <- adf.test(veolia$Adj.Close)#p-value = 0.8228

Since our p-value is higher that 0.05 for both data sets, we cannot reject the null hypothesis of non-stationarity of the time series.

3.2.2 (b) First, compute and plot the negative log returns in independent plots. Discuss the dates of occurrence of extreme events. Then, produce a scatterplot of the negative log returns of both stocks and discuss their (visual) dependence.

Code
# Create log return function
calculate_log_returns <- function(stocks) {
  log_returns <- -diff(log(stocks))
  return(log_returns)
}

# Calculate negative log returns for each index
log_returns_engie <- calculate_log_returns(engie$Adj.Close) 
plot(log_returns_engie)

Code

log_returns_veolia <- calculate_log_returns(veolia$Adj.Close) 
plot(log_returns_veolia)

Code

plot_engie <- plot_ly(x = engie$Date, y = engie$Adj.Close, type = "scatter", mode = "point", name = "Engie") %>%
  layout(title = "Engie")

plot_veolia <- plot_ly(x = veolia$Date, y = veolia$Adj.Close, type = "scatter", mode = "point", name = "Veolia") %>%
  layout(title = "Veolia")

combined_plot_2 <- subplot(plot_engie, plot_veolia)

layout(combined_plot_2, title = "Comparison of stock indices")
Code

df_log_returns <- data.frame(log_returns_engie, log_returns_veolia)

At first sight, we see that the log returns of the indices for both companies follow a similar trend, the extremes happened at more or less same periods and the volatility is very high. The extreme events happened in 2008 (subprime crisis), 2012-2013 (european energy crisis), 2020 (COVID-19), and 2022 (Energy crisis).

Code
# Drawing a scatter plot
par(pty="s")
scatter.smooth(x = log_returns_engie, y = log_returns_veolia, xlab = "Engie negative log returns", ylab = "Veolia negative log returns")

Our scatter plot shows a strong dependance between the negative log returns. No heavy tails displayed, and the shape is close to eliptical.

3.2.3 (c) Plot the pseudo-samples obtained from estimating the marginal distributions by their empirical CDFs.

Code

# We compute and show the CDF for both negative log returns 
cdf_engie <- ecdf(log_returns_engie)
cdf_veolia <- ecdf(log_returns_veolia)

plot(cdf_engie, col = "red", main = "Empirical Cumulative Distributions")
lines(cdf_veolia, col ="blue")

legend("bottomright", legend = c("Engie", "Veolia"), col = c("red", "blue"), lty = 1)

Code

# Probabilities for cdf_engie and cdf_veoila
u1 <- pobs(log_returns_engie)
u2 <- pobs(log_returns_veolia)
data <- data.frame(engie_prob = u1, veolia_prob = u2)
data_matrix <- as.matrix(data)

pairs.panels(data, method = "kendall")

We can see that both empirical cumulative distribution functions are very similar for both Veolia and Engie. The Kendall method plot shows us a correlation coefficient of 0.42, indicating a fairly high correlation

3.2.4 (d) Fit a Gaussian, a t, a Gumbel and a Clayton copula to the data. Select the copula model using an information criterion, such as the AIC.

Code
# Fit Gaussian copula
gaussian_copula <- normalCopula(dim = 2)
fit_gaussian <- fitCopula(gaussian_copula, data, method = "mpl")

# Fit t copula
t_copula <- tCopula(dim = 2, df = 3)
fit_t <- fitCopula(t_copula, data, method = "mpl")

# Fit Gumbel copula
gumbel_copula <- gumbelCopula(dim = 2)
fit_gumbel <- fitCopula(gumbel_copula, data, method = "mpl")

# Fit Clayton copula
clayton_copula <- claytonCopula(dim = 2, param = 2)   
fit_clayton <- fitCopula(clayton_copula, data, method = "mpl")

# Calculate AIC values
aic_values <- AIC(fit_gaussian, fit_t, fit_gumbel, fit_clayton)
aic_values
##              df         AIC
## fit_gaussian  1 -1939.37785
## fit_t         2 -2292.85549
## fit_gumbel    1 -2126.49971
## fit_clayton   1 -1246.12729
min(aic_values$AIC) ###the t distribution fits the best our data
## [1] -2292.85549


BiCopSelect(data[,1], data[,2], familyset=NA)
## Bivariate copula: t (par = 0.61, par2 = 3.92, tau = 0.42)

From the AIC results, we can conclude that the t copula is the best one for our data.

3.2.5 (e) Compute the theoretical tail dependence coefficients for the copulae fitted in (d), using the parameters obtained above.

Code
#Compute the tail dependence parameters through BiCopPar2TailDep and lambda functions
BiCopPar2TailDep(family = 2, par = fit_t@copula@parameters[1], par2 = fit_t@copula@parameters[2])
## $lower
## [1] 0.326725819
## 
## $upper
## [1] 0.326725819

We obtain a lower and upper tail dependence of approximately 0.327 for both, which makes sense since the chosen copula is a t-copula (symmetric distribution). This indicates that there is a positive correlation between the extreme values of Engie and Veolia.

3.2.6 (f) Compute the non-parametric estimators of the tail dependence coefficients obtained from the data. Are these estimates close to the theoretical ones obtained with the best model selected in (d)?

Code

lambda(fit_t@copula)
##       lower       upper 
## 0.326725819 0.326725819
fitLambda(data_matrix, method = "t", lower.tail = F)
## $Lambda
##             [,1]        [,2]
## [1,] 1.000000000 0.327113536
## [2,] 0.327113536 1.000000000
## 
## $P
##             [,1]        [,2]
## [1,] 1.000000000 0.613299811
## [2,] 0.613299811 1.000000000
## 
## $Nu
##            [,1]       [,2]
## [1,]         NA 3.93305468
## [2,] 3.93305468         NA

chiplot(df_log_returns)

From the results of fitLambda, we can observe very similar results to the t-copulae. To ensure these results, we drew a chi plot, which graphically seems to confirm these results. From this information, and the previously done work, we can conclude that there is a positive dependance between the extreme events of Engie and Veolia’s stocks.